xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 01/12] PVHv2 Dom0
@ 2016-07-29 16:28 Roger Pau Monne
  2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
                   ` (13 more replies)
  0 siblings, 14 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:28 UTC (permalink / raw)
  To: xen-devel

Hello,

This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
*should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.

There are some changes compared to a traditional PV or PVH Dom0:

 - An emulated local APIC (for each vCPU) and IO APIC is provided to Dom0.
 - At the start of day only the low 1MB and the ACPI e820 regions are mapped 
   into the guest using 1:1 mappings (for UEFI systems mapping the low 1MB 
   probably doesn't make any sense, but alas).
 - The MADT has been replaced in order to reflect the real topology seen by 
   Dom0.
 - In order to have the BARs of PCI devices mapped Dom0 must call 
   PHYSDEVOP_pci_device_add. See the notes on patch 11 for more information 
   (and what's missing).
 - ATM only legacy PCI interrupts are supported. This is implemented by 
   looking at the writes Dom0 makes to the emulated IO APIC and translating 
   them into the real hardware. Note that MSI or MSI-X capabilities are 
   _not_ masked from the PCI capabilities list, so the user must avoid using
   them.
 - PCIe Enhanced Configuration Mechanism regions are not yet mapped into 
   Dom0 physmap.
 - Some ACPI tables are zapped (it's signature is inverted) to prevent Dom0 
   from poking at them, those are: HPET, SLIT, SRAT, MPST and PMTT.

This is still very experimental, I've been able to boot a FreeBSD Dom0 using 
2GB and 4GB of memory assigned to it, but if I try to use 6GB Xen gets a NMI 
(I've got no idea why yet). I don't think it's worth delaying this more, 
because it's going to take me some time to finish all this work (MSI, 
MSI-X, bug hunting...), and in the meantime people can already take a look 
at what's done (or half-done).

I've pushed the whole series to my git repository:

git://xenbits.xen.org/people/royger/xen.git pvhv2_dom0_rfc

It contains two patches from Boris and Anthony, that are a pre-requisite to 
this series.

As usual, thanks for taking the time to look into it, hope it doesn't make 
your eyes bleed much (slight bleeding is expected).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
@ 2016-07-29 16:28 ` Roger Pau Monne
  2016-07-29 16:47   ` Andrew Cooper
  2016-08-01  8:57   ` Tim Deegan
  2016-07-29 16:28 ` [PATCH RFC 02/12] xen/x86: split the setup of Dom0 permissions to a function Roger Pau Monne
                   ` (12 subsequent siblings)
  13 siblings, 2 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:28 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich, Roger Pau Monne

... and remove hap_set_alloc_for_pvh_dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/domain_build.c     |  7 +++----
 xen/arch/x86/mm/hap/hap.c       | 14 +-------------
 xen/arch/x86/mm/paging.c        | 16 ++++++++++++++++
 xen/arch/x86/mm/shadow/common.c |  5 ++---
 xen/include/asm-x86/hap.h       |  3 ++-
 xen/include/asm-x86/paging.h    |  3 +++
 xen/include/asm-x86/shadow.h    |  6 ++++++
 7 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 0a02d65..d7d4afc 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -35,7 +35,6 @@
 #include <asm/setup.h>
 #include <asm/bzimage.h> /* for bzimage_parse */
 #include <asm/io_apic.h>
-#include <asm/hap.h>
 #include <asm/hpet.h>
 
 #include <public/version.h>
@@ -1383,15 +1382,15 @@ int __init construct_dom0(
                          nr_pages);
     }
 
-    if ( is_pvh_domain(d) )
-        hap_set_alloc_for_pvh_dom0(d, dom0_paging_pages(d, nr_pages));
-
     /*
      * We enable paging mode again so guest_physmap_add_page will do the
      * right thing for us.
      */
     d->arch.paging.mode = save_pvh_pg_mode;
 
+    if ( is_pvh_domain(d) )
+        paging_set_allocation(d, dom0_paging_pages(d, nr_pages));
+
     /* Write the phys->machine and machine->phys table entries. */
     for ( pfn = 0; pfn < count; pfn++ )
     {
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3218fa2..b49b38f 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -334,7 +334,7 @@ hap_get_allocation(struct domain *d)
 
 /* Set the pool of pages to the required number of pages.
  * Returns 0 for success, non-zero for failure. */
-static unsigned int
+unsigned int
 hap_set_allocation(struct domain *d, unsigned int pages, int *preempted)
 {
     struct page_info *pg;
@@ -638,18 +638,6 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
     }
 }
 
-void __init hap_set_alloc_for_pvh_dom0(struct domain *d,
-                                       unsigned long hap_pages)
-{
-    int rc;
-
-    paging_lock(d);
-    rc = hap_set_allocation(d, hap_pages, NULL);
-    paging_unlock(d);
-
-    BUG_ON(rc);
-}
-
 static const struct paging_mode hap_paging_real_mode;
 static const struct paging_mode hap_paging_protected_mode;
 static const struct paging_mode hap_paging_pae_mode;
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 107fc8c..1b270df 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -953,6 +953,22 @@ void paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
         safe_write_pte(p, new);
 }
 
+int paging_set_allocation(struct domain *d, unsigned long pages)
+{
+    int rc;
+
+    ASSERT(paging_mode_enabled(d));
+
+    paging_lock(d);
+    if ( hap_enabled(d) )
+        rc = hap_set_allocation(d, pages, NULL);
+    else
+        rc = sh_set_allocation(d, pages, NULL);
+    paging_unlock(d);
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index c22362f..452e22e 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1604,9 +1604,8 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
  * Input will be rounded up to at least shadow_min_acceptable_pages(),
  * plus space for the p2m table.
  * Returns 0 for success, non-zero for failure. */
-static unsigned int sh_set_allocation(struct domain *d,
-                                      unsigned int pages,
-                                      int *preempted)
+unsigned int sh_set_allocation(struct domain *d, unsigned int pages,
+                               int *preempted)
 {
     struct page_info *sp;
     unsigned int lower_bound;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index c613836..e3c9c98 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -46,7 +46,8 @@ int   hap_track_dirty_vram(struct domain *d,
                            XEN_GUEST_HANDLE_64(uint8) dirty_bitmap);
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
-void hap_set_alloc_for_pvh_dom0(struct domain *d, unsigned long num_pages);
+unsigned int hap_set_allocation(struct domain *d, unsigned int pages,
+				int *preempted);
 
 #endif /* XEN_HAP_H */
 
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index a1401ab..6598007 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -347,6 +347,9 @@ void pagetable_dying(struct domain *d, paddr_t gpa);
 void paging_dump_domain_info(struct domain *d);
 void paging_dump_vcpu_info(struct vcpu *v);
 
+/* Set pool of pages for paging. */
+int paging_set_allocation(struct domain *d, unsigned long pages);
+
 #endif /* XEN_PAGING_H */
 
 /*
diff --git a/xen/include/asm-x86/shadow.h b/xen/include/asm-x86/shadow.h
index 6d0aefb..bc068ec 100644
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -83,6 +83,10 @@ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all);
 /* Discard _all_ mappings from the domain's shadows. */
 void shadow_blow_tables_per_domain(struct domain *d);
 
+/* Set the pool of shadow pages to the required number of pages. */
+unsigned int sh_set_allocation(struct domain *d, unsigned int pages,
+                               int *preempted);
+
 #else /* !CONFIG_SHADOW_PAGING */
 
 #define shadow_teardown(d, p) ASSERT(is_pv_domain(d))
@@ -91,6 +95,8 @@ void shadow_blow_tables_per_domain(struct domain *d);
     ({ ASSERT(is_pv_domain(d)); -EOPNOTSUPP; })
 #define shadow_track_dirty_vram(d, begin_pfn, nr, bitmap) \
     ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
+#define sh_set_allocation(d, pages, preempted) \
+    ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
 
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      bool_t fast, bool_t all) {}
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 02/12] xen/x86: split the setup of Dom0 permissions to a function
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
  2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
@ 2016-07-29 16:28 ` Roger Pau Monne
  2016-07-29 16:28 ` [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain Roger Pau Monne
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

So that it can also be used by the PVH-specific domain builder. This is just
code motion, it should not introduce any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain_build.c | 164 +++++++++++++++++++++++---------------------
 1 file changed, 86 insertions(+), 78 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index d7d4afc..09d79be 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -869,6 +869,89 @@ static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
     unmap_domain_page(l4start);
 }
 
+static int __init setup_permissions(struct domain *d)
+{
+    unsigned long mfn;
+    int i, rc = 0;
+
+    /* The hardware domain is initially permitted full I/O capabilities. */
+    rc |= ioports_permit_access(d, 0, 0xFFFF);
+    rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1);
+    rc |= irqs_permit_access(d, 1, nr_irqs_gsi - 1);
+
+    /*
+     * Modify I/O port access permissions.
+     */
+    /* Master Interrupt Controller (PIC). */
+    rc |= ioports_deny_access(d, 0x20, 0x21);
+    /* Slave Interrupt Controller (PIC). */
+    rc |= ioports_deny_access(d, 0xA0, 0xA1);
+    /* Interval Timer (PIT). */
+    rc |= ioports_deny_access(d, 0x40, 0x43);
+    /* PIT Channel 2 / PC Speaker Control. */
+    rc |= ioports_deny_access(d, 0x61, 0x61);
+    /* ACPI PM Timer. */
+    if ( pmtmr_ioport )
+        rc |= ioports_deny_access(d, pmtmr_ioport, pmtmr_ioport + 3);
+    /* PCI configuration space (NB. 0xcf8 has special treatment). */
+    rc |= ioports_deny_access(d, 0xcfc, 0xcff);
+    /* Command-line I/O ranges. */
+    process_dom0_ioports_disable(d);
+
+    /*
+     * Modify I/O memory access permissions.
+     */
+    /* Local APIC. */
+    if ( mp_lapic_addr != 0 )
+    {
+        mfn = paddr_to_pfn(mp_lapic_addr);
+        rc |= iomem_deny_access(d, mfn, mfn);
+    }
+    /* I/O APICs. */
+    for ( i = 0; i < nr_ioapics; i++ )
+    {
+        mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+            rc |= iomem_deny_access(d, mfn, mfn);
+    }
+    /* MSI range. */
+    rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
+                            paddr_to_pfn(MSI_ADDR_BASE_LO +
+                                         MSI_ADDR_DEST_ID_MASK));
+    /* HyperTransport range. */
+    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
+        rc |= iomem_deny_access(d, paddr_to_pfn(0xfdULL << 32),
+                                paddr_to_pfn((1ULL << 40) - 1));
+
+    /* Remove access to E820_UNUSABLE I/O regions above 1MB. */
+    for ( i = 0; i < e820.nr_map; i++ )
+    {
+        unsigned long sfn, efn;
+        sfn = max_t(unsigned long, paddr_to_pfn(e820.map[i].addr), 0x100ul);
+        efn = paddr_to_pfn(e820.map[i].addr + e820.map[i].size - 1);
+        if ( (e820.map[i].type == E820_UNUSABLE) &&
+             (e820.map[i].size != 0) &&
+             (sfn <= efn) )
+            rc |= iomem_deny_access(d, sfn, efn);
+    }
+
+    /* Prevent access to HPET */
+    if ( hpet_address )
+    {
+        u8 prot_flags = hpet_flags & ACPI_HPET_PAGE_PROTECT_MASK;
+
+        mfn = paddr_to_pfn(hpet_address);
+        if ( prot_flags == ACPI_HPET_PAGE_PROTECT4 )
+            rc |= iomem_deny_access(d, mfn, mfn);
+        else if ( prot_flags == ACPI_HPET_PAGE_PROTECT64 )
+            rc |= iomem_deny_access(d, mfn, mfn + 15);
+        else if ( ro_hpet )
+            rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
+    }
+
+    return rc;
+}
+
 int __init construct_dom0(
     struct domain *d,
     const module_t *image, unsigned long image_headroom,
@@ -1529,84 +1612,9 @@ int __init construct_dom0(
     if ( test_bit(XENFEAT_supervisor_mode_kernel, parms.f_required) )
         panic("Dom0 requires supervisor-mode execution");
 
-    rc = 0;
-
-    /* The hardware domain is initially permitted full I/O capabilities. */
-    rc |= ioports_permit_access(d, 0, 0xFFFF);
-    rc |= iomem_permit_access(d, 0UL, (1UL << (paddr_bits - PAGE_SHIFT)) - 1);
-    rc |= irqs_permit_access(d, 1, nr_irqs_gsi - 1);
-
-    /*
-     * Modify I/O port access permissions.
-     */
-    /* Master Interrupt Controller (PIC). */
-    rc |= ioports_deny_access(d, 0x20, 0x21);
-    /* Slave Interrupt Controller (PIC). */
-    rc |= ioports_deny_access(d, 0xA0, 0xA1);
-    /* Interval Timer (PIT). */
-    rc |= ioports_deny_access(d, 0x40, 0x43);
-    /* PIT Channel 2 / PC Speaker Control. */
-    rc |= ioports_deny_access(d, 0x61, 0x61);
-    /* ACPI PM Timer. */
-    if ( pmtmr_ioport )
-        rc |= ioports_deny_access(d, pmtmr_ioport, pmtmr_ioport + 3);
-    /* PCI configuration space (NB. 0xcf8 has special treatment). */
-    rc |= ioports_deny_access(d, 0xcfc, 0xcff);
-    /* Command-line I/O ranges. */
-    process_dom0_ioports_disable(d);
-
-    /*
-     * Modify I/O memory access permissions.
-     */
-    /* Local APIC. */
-    if ( mp_lapic_addr != 0 )
-    {
-        mfn = paddr_to_pfn(mp_lapic_addr);
-        rc |= iomem_deny_access(d, mfn, mfn);
-    }
-    /* I/O APICs. */
-    for ( i = 0; i < nr_ioapics; i++ )
-    {
-        mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr);
-        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            rc |= iomem_deny_access(d, mfn, mfn);
-    }
-    /* MSI range. */
-    rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
-                            paddr_to_pfn(MSI_ADDR_BASE_LO +
-                                         MSI_ADDR_DEST_ID_MASK));
-    /* HyperTransport range. */
-    if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD )
-        rc |= iomem_deny_access(d, paddr_to_pfn(0xfdULL << 32),
-                                paddr_to_pfn((1ULL << 40) - 1));
-
-    /* Remove access to E820_UNUSABLE I/O regions above 1MB. */
-    for ( i = 0; i < e820.nr_map; i++ )
-    {
-        unsigned long sfn, efn;
-        sfn = max_t(unsigned long, paddr_to_pfn(e820.map[i].addr), 0x100ul);
-        efn = paddr_to_pfn(e820.map[i].addr + e820.map[i].size - 1);
-        if ( (e820.map[i].type == E820_UNUSABLE) &&
-             (e820.map[i].size != 0) &&
-             (sfn <= efn) )
-            rc |= iomem_deny_access(d, sfn, efn);
-    }
-
-    /* Prevent access to HPET */
-    if ( hpet_address )
-    {
-        u8 prot_flags = hpet_flags & ACPI_HPET_PAGE_PROTECT_MASK;
-
-        mfn = paddr_to_pfn(hpet_address);
-        if ( prot_flags == ACPI_HPET_PAGE_PROTECT4 )
-            rc |= iomem_deny_access(d, mfn, mfn);
-        else if ( prot_flags == ACPI_HPET_PAGE_PROTECT64 )
-            rc |= iomem_deny_access(d, mfn, mfn + 15);
-        else if ( ro_hpet )
-            rc |= rangeset_add_singleton(mmio_ro_ranges, mfn);
-    }
-
-    BUG_ON(rc != 0);
+    rc = setup_permissions(d);
+    if ( rc != 0 )
+        panic("Failed to setup Dom0 permissions");
 
     if ( elf_check_broken(&elf) )
         printk(" Xen warning: dom0 kernel broken ELF: %s\n",
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
  2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
  2016-07-29 16:28 ` [PATCH RFC 02/12] xen/x86: split the setup of Dom0 permissions to a function Roger Pau Monne
@ 2016-07-29 16:28 ` Roger Pau Monne
  2016-07-29 17:50   ` Andrew Cooper
  2016-07-29 16:28 ` [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Allow the used of both the emulated local APIC and IO APIC for the hardware
domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1133ea2..4d47228 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -545,25 +545,31 @@ int arch_domain_create(struct domain *d, unsigned int domcr_flags,
     }
     else
     {
-        if ( (config->emulation_flags & ~XEN_X86_EMU_ALL) != 0 )
+        uint32_t emflags = config->emulation_flags;
+
+        if ( (emflags & ~XEN_X86_EMU_ALL) != 0 )
         {
             printk(XENLOG_G_ERR "d%d: Invalid emulation bitmap: %#x\n",
-                   d->domain_id, config->emulation_flags);
+                   d->domain_id, emflags);
             return -EINVAL;
         }
+
         if ( is_hardware_domain(d) )
-            config->emulation_flags |= XEN_X86_EMU_PIT;
-        if ( config->emulation_flags != 0 &&
-             (config->emulation_flags !=
-              (is_hvm_domain(d) ? XEN_X86_EMU_ALL : XEN_X86_EMU_PIT)) )
+            emflags |= XEN_X86_EMU_PIT;
+
+        if ( (is_hardware_domain(d) ?
+              (is_hvm_domain(d) && emflags !=
+              (XEN_X86_EMU_PIT|XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC)) :
+              (emflags && (is_hvm_domain(d) ? (emflags != XEN_X86_EMU_ALL) :
+                                              (emflags != XEN_X86_EMU_PIT)))) )
         {
             printk(XENLOG_G_ERR "d%d: Xen does not allow %s domain creation "
                    "with the current selection of emulators: %#x\n",
                    d->domain_id, is_hvm_domain(d) ? "HVM" : "PV",
-                   config->emulation_flags);
+                   emflags);
             return -EOPNOTSUPP;
         }
-        d->arch.emulation_flags = config->emulation_flags;
+        d->arch.emulation_flags = emflags;
     }
 
     if ( has_hvm_container_domain(d) )
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (2 preceding siblings ...)
  2016-07-29 16:28 ` [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain Roger Pau Monne
@ 2016-07-29 16:28 ` Roger Pau Monne
  2016-07-29 17:57   ` Andrew Cooper
  2016-07-29 16:29 ` [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global Roger Pau Monne
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Split the Dom0 builder into two different functions, one for PV (and classic
PVH), and another one for PVHv2. Introduce a new command line parameter,
dom0hvm in order to request the creation of a PVHv2 Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain_build.c | 27 ++++++++++++++++++++++++++-
 xen/arch/x86/setup.c        |  9 +++++++++
 2 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 09d79be..c0ef40f 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -952,7 +952,7 @@ static int __init setup_permissions(struct domain *d)
     return rc;
 }
 
-int __init construct_dom0(
+static int __init construct_dom0_pv(
     struct domain *d,
     const module_t *image, unsigned long image_headroom,
     module_t *initrd,
@@ -1647,6 +1647,31 @@ out:
     return rc;
 }
 
+static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
+                                     unsigned long image_headroom,
+                                     module_t *initrd,
+                                     void *(*bootstrap_map)(const module_t *),
+                                     char *cmdline)
+{
+
+    printk("** Building a PVH Dom0 **\n");
+
+    return 0;
+}
+
+int __init construct_dom0(struct domain *d, const module_t *image,
+                          unsigned long image_headroom, module_t *initrd,
+                          void *(*bootstrap_map)(const module_t *),
+                          char *cmdline)
+{
+
+    return is_hvm_domain(d) ?
+            construct_dom0_hvm(d, image, image_headroom, initrd,
+                               bootstrap_map, cmdline) :
+            construct_dom0_pv(d, image, image_headroom, initrd, bootstrap_map,
+                              cmdline);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 217c775..8d9c3a0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -75,6 +75,10 @@ unsigned long __read_mostly cr4_pv32_mask;
 static bool_t __initdata opt_dom0pvh;
 boolean_param("dom0pvh", opt_dom0pvh);
 
+/* Boot dom0 in HVM mode */
+static bool_t __initdata opt_dom0hvm;
+boolean_param("dom0hvm", opt_dom0hvm);
+
 /* **** Linux config option: propagated to domain0. */
 /* "acpi=off":    Sisables both ACPI table parsing and interpreter. */
 /* "acpi=force":  Override the disable blacklist.                   */
@@ -1492,6 +1496,11 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( opt_dom0pvh )
         domcr_flags |= DOMCRF_pvh | DOMCRF_hap;
 
+    if ( opt_dom0hvm ) {
+        domcr_flags |= DOMCRF_hvm | (hvm_funcs.hap_supported ? DOMCRF_hap : 0);
+        config.emulation_flags = XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC;
+    }
+
     /*
      * Create initial domain 0.
      * x86 doesn't support arch-configuration. So it's fine to pass
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (3 preceding siblings ...)
  2016-07-29 16:28 ` [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-07-29 17:57   ` Andrew Cooper
  2016-07-29 16:29 ` [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

So that it can be called from the Dom0 builder.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/e820.c        | 2 +-
 xen/include/asm-x86/e820.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index ef077a5..48e35f9 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -87,7 +87,7 @@ static void __init add_memory_region(unsigned long long start,
     e820.nr_map++;
 }
 
-static void __init print_e820_memory_map(struct e820entry *map, unsigned int entries)
+void __init print_e820_memory_map(struct e820entry *map, unsigned int entries)
 {
     unsigned int i;
 
diff --git a/xen/include/asm-x86/e820.h b/xen/include/asm-x86/e820.h
index d9ff4eb..9dad76a 100644
--- a/xen/include/asm-x86/e820.h
+++ b/xen/include/asm-x86/e820.h
@@ -31,6 +31,7 @@ extern int e820_change_range_type(
 extern int e820_add_range(
     struct e820map *, uint64_t s, uint64_t e, uint32_t type);
 extern unsigned long init_e820(const char *, struct e820entry *, unsigned int *);
+extern void print_e820_memory_map(struct e820entry *map, unsigned int entries);
 extern struct e820map e820;
 
 /* These symbols live in the boot trampoline. */
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (4 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-07-29 19:04   ` Andrew Cooper
  2016-07-29 16:29 ` [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Craft the Dom0 e820 memory map and populate it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain_build.c | 199 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 193 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index c0ef40f..cb8ecbd 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -43,6 +43,11 @@ static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
 static long __initdata dom0_max_nrpages = LONG_MAX;
 
+/* GFN of the identity map for EPT. */
+#define HVM_IDENT_PT_GFN  0xfeffeu
+
+static unsigned int __initdata hvm_mem_stats[MAX_ORDER + 1];
+
 /*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
  * 
@@ -304,7 +309,8 @@ static unsigned long __init compute_dom0_nr_pages(
             avail -= max_pdx >> s;
     }
 
-    need_paging = opt_dom0_shadow || (is_pvh_domain(d) && !iommu_hap_pt_share);
+    need_paging = opt_dom0_shadow || (has_hvm_container_domain(d) &&
+                  (!iommu_hap_pt_share || !paging_mode_hap(d)));
     for ( ; ; need_paging = 0 )
     {
         nr_pages = dom0_nrpages;
@@ -336,7 +342,8 @@ static unsigned long __init compute_dom0_nr_pages(
         avail -= dom0_paging_pages(d, nr_pages);
     }
 
-    if ( (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
+    if ( is_pv_domain(d) &&
+         (parms->p2m_base == UNSET_ADDR) && (dom0_nrpages <= 0) &&
          ((dom0_min_nrpages <= 0) || (nr_pages > min_pages)) )
     {
         /*
@@ -547,11 +554,12 @@ static __init void pvh_map_all_iomem(struct domain *d, unsigned long nr_pages)
     ASSERT(nr_holes == 0);
 }
 
-static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
+static __init void hvm_setup_e820(struct domain *d, unsigned long nr_pages)
 {
     struct e820entry *entry, *entry_guest;
     unsigned int i;
     unsigned long pages, cur_pages = 0;
+    uint64_t start, end;
 
     /*
      * Craft the e820 memory map for Dom0 based on the hardware e820 map.
@@ -579,8 +587,19 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
             continue;
         }
 
-        *entry_guest = *entry;
-        pages = PFN_UP(entry_guest->size);
+        /*
+         * Make sure the start and length are aligned to PAGE_SIZE, because
+         * that's the minimum granularity of the 2nd stage translation.
+         */
+        start = ROUNDUP(entry->addr, PAGE_SIZE);
+        end = (entry->addr + entry->size) & PAGE_MASK;
+        if ( start >= end )
+            continue;
+
+        entry_guest->type = E820_RAM;
+        entry_guest->addr = start;
+        entry_guest->size = end - start;
+        pages = PFN_DOWN(entry_guest->size);
         if ( (cur_pages + pages) > nr_pages )
         {
             /* Truncate region */
@@ -591,6 +610,8 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
         {
             cur_pages += pages;
         }
+        ASSERT((entry_guest->addr & ~PAGE_MASK) == 0 &&
+               (entry_guest->size & ~PAGE_MASK) == 0);
  next:
         d->arch.nr_e820++;
         entry_guest++;
@@ -1631,7 +1652,7 @@ static int __init construct_dom0_pv(
         dom0_update_physmap(d, pfn, mfn, 0);
 
         pvh_map_all_iomem(d, nr_pages);
-        pvh_setup_e820(d, nr_pages);
+        hvm_setup_e820(d, nr_pages);
     }
 
     if ( d->domain_id == hardware_domid )
@@ -1647,15 +1668,181 @@ out:
     return rc;
 }
 
+/* Helper to convert from bytes into human-readable form. */
+static void __init pretty_print_bytes(uint64_t size)
+{
+    const char* units[] = {"B", "KB", "MB", "GB", "TB"};
+    int i = 0;
+
+    while ( ++i < sizeof(units) && size >= 1024 )
+        size >>= 10; /* size /= 1024 */
+
+    printk("%4" PRIu64 "%2s", size, units[i-1]);
+}
+
+/* Calculate the biggest usable order given a size in bytes. */
+static inline unsigned int get_order(uint64_t size)
+{
+    unsigned int order;
+    uint64_t pg;
+
+    ASSERT((size & ~PAGE_MASK) == 0);
+    pg = PFN_DOWN(size);
+    for ( order = 0; pg >= (1 << (order + 1)); order++ );
+
+    return order;
+}
+
+/* Populate an HVM memory range using the biggest possible order. */
+static void __init hvm_populate_memory_range(struct domain *d, uint64_t start,
+                                             uint64_t size)
+{
+    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
+    unsigned int order;
+    struct page_info *page;
+    int rc;
+
+    ASSERT((size & ~PAGE_MASK) == 0 && (start & ~PAGE_MASK) == 0);
+
+    order = MAX_ORDER;
+    while ( size != 0 )
+    {
+        order = min(get_order(size), order);
+        page = alloc_domheap_pages(d, order, memflags);
+        if ( page == NULL )
+        {
+            if ( order == 0 && memflags )
+            {
+                /* Try again without any memflags. */
+                memflags = 0;
+                order = MAX_ORDER;
+                continue;
+            }
+            if ( order == 0 )
+                panic("Unable to allocate memory with order 0!\n");
+            order--;
+            continue;
+        }
+
+        hvm_mem_stats[order]++;
+        rc = guest_physmap_add_page(d, _gfn(PFN_DOWN(start)),
+                                    _mfn(page_to_mfn(page)), order);
+        if ( rc != 0 )
+            panic("Failed to populate memory: [%" PRIx64 " - %" PRIx64 "] %d\n",
+                  start, start + (((uint64_t)1) << (order + PAGE_SHIFT)), rc);
+        start += ((uint64_t)1) << (order + PAGE_SHIFT);
+        size -= ((uint64_t)1) << (order + PAGE_SHIFT);
+        if ( (size & 0xffffffff) == 0 )
+            process_pending_softirqs();
+    }
+
+}
+
+static int __init hvm_setup_p2m(struct domain *d)
+{
+    struct vcpu *v = d->vcpu[0];
+    unsigned long nr_pages;
+    int i;
+
+    printk("** Preparing memory map **\n");
+
+    /*
+     * Subtract one page for the EPT identity page table and two pages
+     * for the MADT replacement.
+     */
+    nr_pages = compute_dom0_nr_pages(d, NULL, 0) - 3;
+
+    hvm_setup_e820(d, nr_pages);
+    paging_set_allocation(d, dom0_paging_pages(d, nr_pages));
+
+    printk("Dom0 memory map:\n");
+    print_e820_memory_map(d->arch.e820, d->arch.nr_e820);
+
+    printk("** Populating memory map **\n");
+    /* Populate memory map. */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        if ( d->arch.e820[i].type != E820_RAM )
+            continue;
+
+        hvm_populate_memory_range(d, d->arch.e820[i].addr,
+                                  d->arch.e820[i].size);
+    }
+
+    printk("Memory allocation stats:\n");
+    for ( i = 0; i <= MAX_ORDER; i++ )
+    {
+        if ( hvm_mem_stats[MAX_ORDER - i] != 0 )
+        {
+            printk("Order %2u: ", MAX_ORDER - i);
+            pretty_print_bytes(((uint64_t)1 << (MAX_ORDER - i + PAGE_SHIFT)) *
+                               hvm_mem_stats[MAX_ORDER - i]);
+            printk("\n");
+        }
+    }
+
+    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
+    {
+        struct vcpu *saved_current;
+        struct page_info *page;
+        uint32_t *ident_pt;
+
+        /*
+         * Identity-map page table is required for running with CR0.PG=0
+         * when using Intel EPT. Create a 32-bit non-PAE page directory of
+         * superpages.
+         */
+        page = alloc_domheap_pages(d, 0, 0);
+        if ( unlikely(!page) )
+        {
+            printk("Unable to allocate page for identity map\n");
+            return -ENOMEM;
+        }
+
+        saved_current = current;
+        set_current(v);
+
+        ident_pt = __map_domain_page(page);
+        for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
+            ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
+                           _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
+        unmap_domain_page(ident_pt);
+
+        guest_physmap_add_page(d, _gfn(HVM_IDENT_PT_GFN),
+                               _mfn(page_to_mfn(page)), 0);
+        d->arch.hvm_domain.params[HVM_PARAM_IDENT_PT] =
+                                 HVM_IDENT_PT_GFN << PAGE_SHIFT;
+        set_current(saved_current);
+    }
+
+    return 0;
+}
+
 static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    int rc;
 
     printk("** Building a PVH Dom0 **\n");
 
+    /* Sanity! */
+    BUG_ON(d->domain_id != 0);
+    BUG_ON(d->vcpu[0] == NULL);
+
+    process_pending_softirqs();
+
+    iommu_hwdom_init(d);
+
+    rc = hvm_setup_p2m(d);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 physical memory map\n");
+        return rc;
+    }
+
     return 0;
 }
 
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (5 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-09-26 16:16   ` Jan Beulich
  2016-07-29 16:29 ` [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs Roger Pau Monne
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Introduce a helper to parse the Dom0 kernel.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/domain_build.c | 139 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index cb8ecbd..df6354a 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -22,6 +22,7 @@
 #include <xen/compat.h>
 #include <xen/libelf.h>
 #include <xen/pfn.h>
+#include <xen/guest_access.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -38,6 +39,7 @@
 #include <asm/hpet.h>
 
 #include <public/version.h>
+#include <public/arch-x86/hvm/start_info.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -1818,12 +1820,141 @@ static int __init hvm_setup_p2m(struct domain *d)
     return 0;
 }
 
+static int __init hvm_load_kernel(struct domain *d, const module_t *image,
+                                  unsigned long image_headroom,
+                                  module_t *initrd, char *image_base,
+                                  char *cmdline, paddr_t *entry,
+                                  paddr_t *start_info_addr)
+{
+    char *image_start = image_base + image_headroom;
+    unsigned long image_len = image->mod_end;
+    struct elf_binary elf;
+    struct elf_dom_parms parms;
+    paddr_t last_addr;
+    struct hvm_start_info start_info;
+    struct hvm_modlist_entry mod;
+    struct vcpu *saved_current, *v = d->vcpu[0];
+    int rc;
+
+    printk("** Parsing Dom0 kernel **\n");
+
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+    {
+        printk("Error trying to detect bz compressed kernel\n");
+        return rc;
+    }
+
+    if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
+    {
+        printk("Unable to init ELF\n");
+        return rc;
+    }
+#ifdef VERBOSE
+    elf_set_verbose(&elf);
+#endif
+    elf_parse_binary(&elf);
+    if ( (rc = elf_xen_parse(&elf, &parms)) != 0 )
+    {
+        printk("Unable to parse kernel for ELFNOTES\n");
+        return rc;
+    }
+
+    if ( parms.phys_entry == UNSET_ADDR32 ) {
+        printk("Unable to find kernel entry point, aborting\n");
+        return -EINVAL;
+    }
+
+    printk("OS: %s version: %s loader: %s bitness: %s\n", parms.guest_os,
+           parms.guest_ver, parms.loader,
+           elf_64bit(&elf) ? "64-bit" : "32-bit");
+
+    printk("** Loading Dom0 kernel **\n");
+    /* Copy the OS image and free temporary buffer. */
+    elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
+    elf.dest_size = parms.virt_kend - parms.virt_kstart;
+
+    saved_current = current;
+    set_current(v);
+
+    rc = elf_load_binary(&elf);
+    if ( rc < 0 )
+    {
+        printk("Failed to load kernel: %d\n", rc);
+        printk("Xen dom0 kernel broken ELF: %s\n", elf_check_broken(&elf));
+        goto out;
+    }
+
+    last_addr = ROUNDUP(parms.virt_kend - parms.virt_base, PAGE_SIZE);
+    printk("** Copying Dom0 modules **\n");
+
+    rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
+                                initrd->mod_end);
+    if ( rc != HVMCOPY_okay )
+    {
+        printk("Unable to copy initrd to guest\n");
+        rc = -EFAULT;
+        goto out;
+    }
+
+    mod.paddr = last_addr;
+    mod.size = initrd->mod_end;
+    last_addr += ROUNDUP(initrd->mod_end, PAGE_SIZE);
+
+    /* Free temporary buffers. */
+    discard_initial_images();
+
+    printk("** Setting up start-of-day info **\n");
+
+    memset(&start_info, 0, sizeof(start_info));
+    if ( cmdline != NULL )
+    {
+        rc = hvm_copy_to_guest_phys(last_addr, cmdline, strlen(cmdline) + 1);
+        if ( rc != HVMCOPY_okay )
+        {
+            printk("Unable to copy guest command line\n");
+            rc = -EFAULT;
+            goto out;
+        }
+        start_info.cmdline_paddr = last_addr;
+        last_addr += ROUNDUP(strlen(cmdline) + 1, 8);
+    }
+    rc = hvm_copy_to_guest_phys(last_addr, &mod, sizeof(mod));
+    if ( rc != HVMCOPY_okay )
+    {
+        printk("Unable to copy guest modules\n");
+        rc = -EFAULT;
+        goto out;
+    }
+
+    start_info.modlist_paddr = last_addr;
+    start_info.nr_modules = 1;
+    last_addr += sizeof(mod);
+    start_info.magic = XEN_HVM_START_MAGIC_VALUE;
+    start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info));
+    if ( rc != HVMCOPY_okay )
+    {
+        printk("Unable to copy start info to guest\n");
+        rc = -EFAULT;
+        goto out;
+    }
+
+    *entry = parms.phys_entry;
+    *start_info_addr = last_addr;
+    rc = 0;
+
+out:
+    set_current(saved_current);
+    return rc;
+}
+
 static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
                                      void *(*bootstrap_map)(const module_t *),
                                      char *cmdline)
 {
+    paddr_t entry, start_info;
     int rc;
 
     printk("** Building a PVH Dom0 **\n");
@@ -1843,6 +1974,14 @@ static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = hvm_load_kernel(d, image, image_headroom, initrd, bootstrap_map(image),
+                         cmdline, &entry, &start_info);
+    if ( rc )
+    {
+        printk("Failed to load Dom0 kernel\n");
+        return rc;
+    }
+
     return 0;
 }
 
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (6 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-09-26 16:19   ` Jan Beulich
  2016-07-29 16:29 ` [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

Initialize Dom0 BSP/APs and setup the memory and IO permissions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
The logic used to setup the CPUID leaves is extremely simplistic (and
probably wrong for hardware different than mine). I'm not sure what's the
best way to deal with this, the code that currently sets the CPUID leaves
for HVM guests lives in libxc, maybe moving it xen/common would be better?
---
 xen/arch/x86/domain_build.c | 103 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index df6354a..89ef59e 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -40,6 +40,7 @@
 
 #include <public/version.h>
 #include <public/arch-x86/hvm/start_info.h>
+#include <public/hvm/hvm_vcpu.h>
 
 static long __initdata dom0_nrpages;
 static long __initdata dom0_min_nrpages;
@@ -1948,6 +1949,101 @@ out:
     return rc;
 }
 
+static int __init hvm_setup_cpus(struct domain *d, paddr_t entry,
+                                 paddr_t start_info)
+{
+    vcpu_hvm_context_t cpu_ctx;
+    struct vcpu *v = d->vcpu[0];
+    int cpu, i, rc;
+    struct {
+        uint32_t index;
+        uint32_t count;
+    } cpuid_leaves[] = {
+        {0, XEN_CPUID_INPUT_UNUSED},
+        {1, XEN_CPUID_INPUT_UNUSED},
+        {2, XEN_CPUID_INPUT_UNUSED},
+        {4, 0},
+        {4, 1},
+        {4, 2},
+        {4, 3},
+        {4, 4},
+        {7, 0},
+        {0xa, XEN_CPUID_INPUT_UNUSED},
+        {0xd, 0},
+        {0x80000000, XEN_CPUID_INPUT_UNUSED},
+        {0x80000001, XEN_CPUID_INPUT_UNUSED},
+        {0x80000002, XEN_CPUID_INPUT_UNUSED},
+        {0x80000003, XEN_CPUID_INPUT_UNUSED},
+        {0x80000004, XEN_CPUID_INPUT_UNUSED},
+        {0x80000005, XEN_CPUID_INPUT_UNUSED},
+        {0x80000006, XEN_CPUID_INPUT_UNUSED},
+        {0x80000007, XEN_CPUID_INPUT_UNUSED},
+        {0x80000008, XEN_CPUID_INPUT_UNUSED},
+    };
+
+    printk("** Setting up BSP/APs **\n");
+
+    cpu = v->processor;
+    for ( i = 1; i < d->max_vcpus; i++ )
+    {
+        cpu = cpumask_cycle(cpu, &dom0_cpus);
+        setup_dom0_vcpu(d, i, cpu);
+    }
+
+    memset(&cpu_ctx, 0, sizeof(cpu_ctx));
+
+    cpu_ctx.mode = VCPU_HVM_MODE_32B;
+
+    cpu_ctx.cpu_regs.x86_32.ebx = start_info;
+    cpu_ctx.cpu_regs.x86_32.eip = entry;
+    cpu_ctx.cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET;
+
+    cpu_ctx.cpu_regs.x86_32.cs_limit = ~0u;
+    cpu_ctx.cpu_regs.x86_32.ds_limit = ~0u;
+    cpu_ctx.cpu_regs.x86_32.ss_limit = ~0u;
+    cpu_ctx.cpu_regs.x86_32.tr_limit = 0x67;
+    cpu_ctx.cpu_regs.x86_32.cs_ar = 0xc9b;
+    cpu_ctx.cpu_regs.x86_32.ds_ar = 0xc93;
+    cpu_ctx.cpu_regs.x86_32.ss_ar = 0xc93;
+    cpu_ctx.cpu_regs.x86_32.tr_ar = 0x8b;
+
+    rc = arch_set_info_hvm_guest(v, &cpu_ctx);
+    if ( rc )
+    {
+        printk("Unable to setup Dom0 BSP context: %d\n", rc);
+        return rc;
+    }
+    clear_bit(_VPF_down, &v->pause_flags);
+
+    for ( i = 0; i < ARRAY_SIZE(cpuid_leaves); i++ )
+    {
+        d->arch.cpuids[i].input[0] = cpuid_leaves[i].index;
+        d->arch.cpuids[i].input[1] = cpuid_leaves[i].count;
+        if ( d->arch.cpuids[i].input[1] == XEN_CPUID_INPUT_UNUSED )
+            cpuid(d->arch.cpuids[i].input[0], &d->arch.cpuids[i].eax,
+                  &d->arch.cpuids[i].ebx, &d->arch.cpuids[i].ecx,
+                  &d->arch.cpuids[i].edx);
+        else
+            cpuid_count(d->arch.cpuids[i].input[0], d->arch.cpuids[i].input[1],
+                        &d->arch.cpuids[i].eax, &d->arch.cpuids[i].ebx,
+                        &d->arch.cpuids[i].ecx, &d->arch.cpuids[i].edx);
+        /* XXX: we need to do much more filtering here. */
+        if ( d->arch.cpuids[i].input[0] == 1 )
+            d->arch.cpuids[i].ecx &= ~X86_FEATURE_VMX;
+    }
+
+    rc = setup_permissions(d);
+    if ( rc )
+    {
+        panic("Unable to setup Dom0 permissions: %d\n", rc);
+        return rc;
+    }
+
+    update_domain_wallclock_time(d);
+
+    return 0;
+}
+
 static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -1982,6 +2078,13 @@ static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = hvm_setup_cpus(d, entry, start_info);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 CPUs: %d\n", rc);
+        return rc;
+    }
+
     return 0;
 }
 
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (7 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-09-26 16:21   ` Jan Beulich
  2016-07-29 16:29 ` [PATCH RFC 10/12] xen/dcpi: add a dpci passthrough handler for hardware domain Roger Pau Monne
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

This maps all the regions in the e820 marked as E820_ACPI or E820_NVS to
Dom0 1:1. It also shadows the page(s) where the native MADT is placed by
mapping a RAM page over it, copying the original data and modifying it
afterwards in order to represent the real CPU topology exposed to Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
FWIW, I think that the current approach that I've used in order to craft the
MADT is not the best one, IMHO it would be better to place the MADT at the
end of the E820_ACPI region (expanding it's size one page), and modify the
XSDT/RSDT in order to point to it, that way we avoid shadowing any other
ACPI data that might be at the same page as the native MADT (and that needs
to be modified by Dom0) .
---
 xen/arch/x86/domain_build.c | 250 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 250 insertions(+)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 89ef59e..fad4f5c 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -23,6 +23,7 @@
 #include <xen/libelf.h>
 #include <xen/pfn.h>
 #include <xen/guest_access.h>
+#include <xen/acpi.h>
 #include <asm/regs.h>
 #include <asm/system.h>
 #include <asm/io.h>
@@ -38,6 +39,8 @@
 #include <asm/io_apic.h>
 #include <asm/hpet.h>
 
+#include <acpi/actables.h>
+
 #include <public/version.h>
 #include <public/arch-x86/hvm/start_info.h>
 #include <public/hvm/hvm_vcpu.h>
@@ -50,6 +53,8 @@ static long __initdata dom0_max_nrpages = LONG_MAX;
 #define HVM_IDENT_PT_GFN  0xfeffeu
 
 static unsigned int __initdata hvm_mem_stats[MAX_ORDER + 1];
+static unsigned int __initdata acpi_intr_overrrides = 0;
+static struct acpi_madt_interrupt_override __initdata *intsrcovr = NULL;
 
 /*
  * dom0_mem=[min:<min_amt>,][max:<max_amt>,][<amt>]
@@ -1932,6 +1937,7 @@ static int __init hvm_load_kernel(struct domain *d, const module_t *image,
     last_addr += sizeof(mod);
     start_info.magic = XEN_HVM_START_MAGIC_VALUE;
     start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+    start_info.rsdp_paddr = acpi_os_get_root_pointer();
     rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info));
     if ( rc != HVMCOPY_okay )
     {
@@ -2044,6 +2050,243 @@ static int __init hvm_setup_cpus(struct domain *d, paddr_t entry,
     return 0;
 }
 
+static int __init acpi_count_intr_ov(struct acpi_subtable_header *header,
+                                     const unsigned long end)
+{
+
+    acpi_intr_overrrides++;
+    return 0;
+}
+
+static int __init acpi_set_intr_ov(struct acpi_subtable_header *header,
+                                   const unsigned long end)
+{
+    struct acpi_madt_interrupt_override *intr =
+        container_of(header, struct acpi_madt_interrupt_override, header);
+
+    ACPI_MEMCPY(intsrcovr, intr, sizeof(*intr));
+    intsrcovr++;
+
+    return 0;
+}
+
+static void __init acpi_zap_table_signature(char *name)
+{
+    struct acpi_table_header *table;
+    acpi_status status;
+    union {
+        char str[ACPI_NAME_SIZE];
+        uint32_t bits;
+    } signature;
+    char tmp;
+    int i;
+
+    status = acpi_get_table(name, 0, &table);
+    if ( ACPI_SUCCESS(status) )
+    {
+        memcpy(&signature.str[0], &table->signature[0], ACPI_NAME_SIZE);
+        for ( i = 0; i < ACPI_NAME_SIZE / 2; i++ )
+        {
+            tmp = signature.str[ACPI_NAME_SIZE - i - 1];
+            signature.str[ACPI_NAME_SIZE - i - 1] = signature.str[i];
+            signature.str[i] = tmp;
+        }
+        write_atomic((uint32_t*)&table->signature[0], signature.bits);
+    }
+}
+
+static int __init acpi_map(struct domain *d, unsigned long pfn,
+                           unsigned long nr_pages)
+{
+    int rc;
+
+    while ( nr_pages > 0 )
+    {
+        rc = map_mmio_regions(d, _gfn(pfn), nr_pages, _mfn(pfn));
+        if ( rc == 0 )
+            break;
+        if ( rc < 0 )
+        {
+            printk("Failed to map %#lx - %#lx into Dom0 memory map: %d\n",
+                   pfn, pfn + nr_pages, rc);
+            return rc;
+        }
+        nr_pages -= rc;
+        pfn += rc;
+        process_pending_softirqs();
+    }
+
+    return rc;
+}
+
+static int __init hvm_setup_acpi(struct domain *d)
+{
+    struct vcpu *saved_current, *v = d->vcpu[0];
+    unsigned long pfn, nr_pages;
+    uint64_t size, start_addr, end_addr;
+    uint64_t madt_addr[2] = { 0, 0 };
+    struct acpi_table_header *table;
+    struct acpi_table_madt *madt;
+    struct acpi_madt_io_apic *io_apic;
+    struct acpi_madt_local_apic *local_apic;
+    acpi_status status;
+    int rc, i;
+
+    printk("** Setup ACPI tables **\n");
+
+    /* ZAP the HPET, SLIT, SRAT, MPST and PMTT tables. */
+    acpi_zap_table_signature(ACPI_SIG_HPET);
+    acpi_zap_table_signature(ACPI_SIG_SLIT);
+    acpi_zap_table_signature(ACPI_SIG_SRAT);
+    acpi_zap_table_signature(ACPI_SIG_MPST);
+    acpi_zap_table_signature(ACPI_SIG_PMTT);
+
+    /* Map ACPI tables 1:1 */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        if ( d->arch.e820[i].type != E820_ACPI &&
+             d->arch.e820[i].type != E820_NVS )
+            continue;
+
+        pfn = PFN_DOWN(d->arch.e820[i].addr);
+        nr_pages = DIV_ROUND_UP(d->arch.e820[i].size, PAGE_SIZE);
+
+        rc = acpi_map(d, pfn, nr_pages);
+        if ( rc )
+        {
+            printk(
+                "Failed to map ACPI region %#lx - %#lx into Dom0 memory map\n",
+                   pfn, pfn + nr_pages);
+            return rc;
+        }
+    }
+
+    /* Map the first 1MB 1:1 also */
+    pfn = 0;
+    nr_pages = 0x100;
+    rc = acpi_map(d, pfn, nr_pages);
+    if ( rc )
+    {
+        printk(
+            "Failed to map low 1MB region %#lx - %#lx into Dom0 memory map\n",
+               pfn, pfn + nr_pages);
+        return rc;
+    }
+
+    acpi_get_table_phys(ACPI_SIG_MADT, 0, &madt_addr[0], &size);
+    if ( !madt_addr[0] )
+    {
+        printk("Unable to find ACPI MADT table\n");
+        return -EINVAL;
+    }
+    if ( size > PAGE_SIZE )
+    {
+        printk("MADT table is bigger than PAGE_SIZE, aborting\n");
+        return -EINVAL;
+    }
+
+    acpi_get_table_phys(ACPI_SIG_MADT, 2, &madt_addr[1], &size);
+    if ( madt_addr[1] != 0 && madt_addr[1] != madt_addr[0] )
+    {
+        printk("Multiple MADT tables found, aborting\n");
+        return -EINVAL;
+    }
+
+    /*
+     * Populate the guest physical memory were MADT resides with empty RAM
+     * pages. This will remove the 1:1 mapping in this area, so that Xen
+     * can modify it without any side-effects.
+     */
+    start_addr = madt_addr[0] & PAGE_MASK;
+    end_addr = PAGE_ALIGN(madt_addr[0] + size);
+    hvm_populate_memory_range(d, start_addr, end_addr - start_addr);
+
+    /* Get the address where the MADT is currently mapped. */
+    status = acpi_get_table(ACPI_SIG_MADT, 0, &table);
+    if ( !ACPI_SUCCESS(status) )
+    {
+        printk("Failed to get MADT ACPI table, aborting.\n");
+        return -EINVAL;
+    }
+
+    /*
+     * Copy the original MADT table (and whatever is around it) to the
+     * guest physmap.
+     */
+    saved_current = current;
+    set_current(v);
+    rc = hvm_copy_to_guest_phys(start_addr,
+                                (void *)((uintptr_t)table & PAGE_MASK),
+                                end_addr - start_addr);
+    set_current(saved_current);
+    if ( rc != HVMCOPY_okay )
+    {
+        printk("Unable to copy original MADT page(s)\n");
+        return -EFAULT;
+    }
+
+    /* Craft a new MADT for the guest */
+
+    /* Count number of interrupt overrides. */
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_count_intr_ov,
+                          MAX_IRQ_SOURCES);
+    size = sizeof(struct acpi_table_madt);
+    size += sizeof(struct acpi_madt_interrupt_override) * acpi_intr_overrrides;
+    size += sizeof(struct acpi_madt_io_apic);
+    size += sizeof(struct acpi_madt_local_apic) * dom0_max_vcpus();
+
+    madt = xzalloc_bytes(size);
+    ACPI_MEMCPY(madt, table, sizeof(*madt));
+    madt->address = APIC_DEFAULT_PHYS_BASE;
+    io_apic = (struct acpi_madt_io_apic *)(madt + 1);
+    io_apic->header.type = ACPI_MADT_TYPE_IO_APIC;
+    io_apic->header.length = sizeof(*io_apic);
+    io_apic->id = 1;
+    io_apic->address = VIOAPIC_DEFAULT_BASE_ADDRESS;
+
+    if ( dom0_max_vcpus() > num_online_cpus() )
+    {
+        printk("CPU overcommit is not supported for Dom0\n");
+        xfree(madt);
+        return -EINVAL;
+    }
+
+    local_apic = (struct acpi_madt_local_apic *)(io_apic + 1);
+    for ( i = 0; i < dom0_max_vcpus(); i++ )
+    {
+        local_apic->header.type = ACPI_MADT_TYPE_LOCAL_APIC;
+        local_apic->header.length = sizeof(*local_apic);
+        local_apic->processor_id = i;
+        local_apic->id = i * 2;
+        local_apic->lapic_flags = ACPI_MADT_ENABLED;
+        local_apic++;
+    }
+
+    intsrcovr = (struct acpi_madt_interrupt_override *)local_apic;
+    acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_set_intr_ov,
+                          MAX_IRQ_SOURCES);
+    ASSERT(((unsigned char *)intsrcovr - (unsigned char *)madt) == size);
+    madt->header.length = size;
+    madt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, madt),
+                                              madt->header.length);
+
+    /* Copy the new MADT table to the guest physmap. */
+    saved_current = current;
+    set_current(v);
+    rc = hvm_copy_to_guest_phys(madt_addr[0], madt, size);
+    set_current(saved_current);
+    if ( rc != HVMCOPY_okay )
+    {
+        printk("Unable to copy modified MADT page(s)\n");
+        xfree(madt);
+        return -EFAULT;
+    }
+
+    xfree(madt);
+
+    return 0;
+}
+
 static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
                                      unsigned long image_headroom,
                                      module_t *initrd,
@@ -2085,6 +2328,13 @@ static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
         return rc;
     }
 
+    rc = hvm_setup_acpi(d);
+    if ( rc )
+    {
+        printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
+        return rc;
+    }
+
     return 0;
 }
 
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 10/12] xen/dcpi: add a dpci passthrough handler for hardware domain
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (8 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-07-29 16:29 ` [PATCH RFC 11/12] xen/x86: allow a PVHv2 Dom0 to register PCI devices with Xen Roger Pau Monne
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich, Roger Pau Monne

This is very similar to the PCI trap used for the traditional PV(H) Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
 xen/arch/x86/hvm/io.c         | 72 ++++++++++++++++++++++++++++++++++++++++++-
 xen/arch/x86/traps.c          | 39 -----------------------
 xen/drivers/passthrough/pci.c | 39 +++++++++++++++++++++++
 xen/include/xen/pci.h         |  2 ++
 4 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 1e7a5f9..31d54dc 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -247,12 +247,79 @@ static int dpci_portio_write(const struct hvm_io_handler *handler,
     return X86EMUL_OKAY;
 }
 
+static bool_t hw_dpci_portio_accept(const struct hvm_io_handler *handler,
+                                    const ioreq_t *p)
+{
+    if ( (p->addr == 0xcf8 && p->size == 4) || (p->addr & 0xfffc) == 0xcfc)
+    {
+        return 1;
+    }
+
+    return 0;
+}
+
+static int hw_dpci_portio_read(const struct hvm_io_handler *handler,
+                            uint64_t addr,
+                            uint32_t size,
+                            uint64_t *data)
+{
+    struct domain *currd = current->domain;
+
+    if ( addr == 0xcf8 )
+    {
+        ASSERT(size == 4);
+        *data = currd->arch.pci_cf8;
+        return X86EMUL_OKAY;
+    }
+
+    ASSERT((addr & 0xfffc) == 0xcfc);
+    size = min(size, 4 - ((uint32_t)addr & 3));
+    if ( size == 3 )
+        size = 2;
+    if ( pci_cfg_ok(currd, addr & 3, size, NULL) )
+        *data = pci_conf_read(currd->arch.pci_cf8, addr & 3, size);
+
+    return X86EMUL_OKAY;
+}
+
+static int hw_dpci_portio_write(const struct hvm_io_handler *handler,
+                                uint64_t addr,
+                                uint32_t size,
+                                uint64_t data)
+{
+    struct domain *currd = current->domain;
+    uint32_t data32;
+
+    if ( addr == 0xcf8 )
+    {
+            ASSERT(size == 4);
+            currd->arch.pci_cf8 = data;
+            return X86EMUL_OKAY;
+    }
+
+    ASSERT((addr & 0xfffc) == 0xcfc);
+    size = min(size, 4 - ((uint32_t)addr & 3));
+    if ( size == 3 )
+        size = 2;
+    data32 = data;
+    if ( pci_cfg_ok(currd, addr & 3, size, &data32) )
+        pci_conf_write(currd->arch.pci_cf8, addr & 3, size, data);
+
+    return X86EMUL_OKAY;
+}
+
 static const struct hvm_io_ops dpci_portio_ops = {
     .accept = dpci_portio_accept,
     .read = dpci_portio_read,
     .write = dpci_portio_write
 };
 
+static const struct hvm_io_ops hw_dpci_portio_ops = {
+    .accept = hw_dpci_portio_accept,
+    .read = hw_dpci_portio_read,
+    .write = hw_dpci_portio_write
+};
+
 void register_dpci_portio_handler(struct domain *d)
 {
     struct hvm_io_handler *handler = hvm_next_io_handler(d);
@@ -261,7 +328,10 @@ void register_dpci_portio_handler(struct domain *d)
         return;
 
     handler->type = IOREQ_TYPE_PIO;
-    handler->ops = &dpci_portio_ops;
+    if ( is_hardware_domain(d) )
+        handler->ops = &hw_dpci_portio_ops;
+    else
+        handler->ops = &dpci_portio_ops;
 }
 
 /*
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 767d0b0..4333bc1 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2031,45 +2031,6 @@ static bool_t admin_io_okay(unsigned int port, unsigned int bytes,
     return ioports_access_permitted(d, port, port + bytes - 1);
 }
 
-static bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
-                         unsigned int size, uint32_t *write)
-{
-    uint32_t machine_bdf;
-
-    if ( !is_hardware_domain(currd) )
-        return 0;
-
-    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
-        return 1;
-
-    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
-    if ( write )
-    {
-        const unsigned long *ro_map = pci_get_ro_map(0);
-
-        if ( ro_map && test_bit(machine_bdf, ro_map) )
-            return 0;
-    }
-    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
-    /* AMD extended configuration space access? */
-    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
-         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
-         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
-    {
-        uint64_t msr_val;
-
-        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
-            return 0;
-        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
-            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
-    }
-
-    return !write ?
-           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
-                                     start, start + size - 1, 0) == 0 :
-           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
-}
-
 uint32_t guest_io_read(unsigned int port, unsigned int bytes,
                        struct domain *currd)
 {
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 8bce213..e3595a9 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -966,6 +966,45 @@ void pci_check_disable_device(u16 seg, u8 bus, u8 devfn)
                      PCI_COMMAND, cword & ~PCI_COMMAND_MASTER);
 }
 
+bool_t pci_cfg_ok(struct domain *currd, unsigned int start,
+                         unsigned int size, uint32_t *write)
+{
+    uint32_t machine_bdf;
+
+    if ( !is_hardware_domain(currd) )
+        return 0;
+
+    if ( !CF8_ENABLED(currd->arch.pci_cf8) )
+        return 1;
+
+    machine_bdf = CF8_BDF(currd->arch.pci_cf8);
+    if ( write )
+    {
+        const unsigned long *ro_map = pci_get_ro_map(0);
+
+        if ( ro_map && test_bit(machine_bdf, ro_map) )
+            return 0;
+    }
+    start |= CF8_ADDR_LO(currd->arch.pci_cf8);
+    /* AMD extended configuration space access? */
+    if ( CF8_ADDR_HI(currd->arch.pci_cf8) &&
+         boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+         boot_cpu_data.x86 >= 0x10 && boot_cpu_data.x86 <= 0x17 )
+    {
+        uint64_t msr_val;
+
+        if ( rdmsr_safe(MSR_AMD64_NB_CFG, msr_val) )
+            return 0;
+        if ( msr_val & (1ULL << AMD64_NB_CFG_CF8_EXT_ENABLE_BIT) )
+            start |= CF8_ADDR_HI(currd->arch.pci_cf8);
+    }
+
+    return !write ?
+           xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
+                                     start, start + size - 1, 0) == 0 :
+           pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 0;
+}
+
 /*
  * scan pci devices to add all existed PCI devices to alldevs_list,
  * and setup pci hierarchy in array bus2bridge.
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 0872401..f191773 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -162,6 +162,8 @@ const char *parse_pci(const char *, unsigned int *seg, unsigned int *bus,
 
 bool_t pcie_aer_get_firmware_first(const struct pci_dev *);
 
+bool_t pci_cfg_ok(struct domain *, unsigned int, unsigned int, uint32_t *);
+
 struct pirq;
 int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);
 void msixtbl_pt_unregister(struct domain *, struct pirq *);
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 11/12] xen/x86: allow a PVHv2 Dom0 to register PCI devices with Xen
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (9 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 10/12] xen/dcpi: add a dpci passthrough handler for hardware domain Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-07-29 16:29 ` [PATCH RFC 12/12] xen/x86: route legacy PCI interrupts to Dom0 Roger Pau Monne
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monne

This patch allows a PVHv2 Dom0 to register PCI devices with Xen using
PHYSDEVOP_pci_mmcfg_reserved, PHYSDEVOP_pci_device_add and
PHYSDEVOP_pci_device_remove. The functionality of the pci_device_add
function has been expanded so that for PVHv2 Dom0 it also sizes the PCI
device BARs and adds them to the Dom0 memory map using a 1:1 mapping.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---
This is incomplete, devices with SR-IOV BARs are not going to be properly
mapped, pci_remove_device will not unmap the BARs, and if the Dom0 OS
changes the position of the BARs the 1:1 mappings are not going to be
updated, so Dom0 is going to lose access to them...
---
 xen/arch/x86/hvm/hvm.c        |   6 ++
 xen/drivers/passthrough/pci.c | 153 +++++++++++++++++++++++++++++++++---------
 2 files changed, 127 insertions(+), 32 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index db4b2d6..9434540 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4029,6 +4029,12 @@ static long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( !is_pvh_vcpu(current) || !is_hardware_domain(current->domain) )
             return -ENOSYS;
         /* fall through */
+    case PHYSDEVOP_pci_mmcfg_reserved:
+    case PHYSDEVOP_pci_device_add:
+    case PHYSDEVOP_pci_device_remove:
+        if ( !is_hardware_domain(current->domain) )
+            return -ENOSYS;
+        /* fall through */
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
     case PHYSDEVOP_eoi:
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e3595a9..18687e0 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -587,6 +587,52 @@ static void pci_enable_acs(struct pci_dev *pdev)
     pci_conf_write16(seg, bus, dev, func, pos + PCI_ACS_CTRL, ctrl);
 }
 
+static int pci_size_bar(unsigned int seg, unsigned int bus, unsigned int slot,
+                        unsigned int func, unsigned int base,
+                        unsigned int max_bars, unsigned int *index,
+                        uint64_t *addr, uint64_t *size)
+{
+    unsigned int idx = base + *index * 4;
+    u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
+    u32 hi = 0;
+
+    *addr = *size = 0;
+
+    ASSERT((bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_MEMORY);
+    pci_conf_write32(seg, bus, slot, func, idx, ~0);
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        if ( *index >= max_bars )
+        {
+            printk(XENLOG_WARNING
+                   "device %04x:%02x:%02x.%u with 64-bit BAR in last slot\n",
+                   seg, bus, slot, func);
+            return -EINVAL;
+        }
+        hi = pci_conf_read32(seg, bus, slot, func, idx + 4);
+        pci_conf_write32(seg, bus, slot, func, idx + 4, ~0);
+    }
+    *size = pci_conf_read32(seg, bus, slot, func, idx) &
+            PCI_BASE_ADDRESS_MEM_MASK;
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        *size |= (u64)pci_conf_read32(seg, bus, slot, func, idx + 4) << 32;
+        pci_conf_write32(seg, bus, slot, func, idx + 4, hi);
+    }
+    else if ( *size )
+        *size |= (u64)~0 << 32;
+    pci_conf_write32(seg, bus, slot, func, idx, bar);
+    *size = - *size;
+    *addr = (bar & PCI_BASE_ADDRESS_MEM_MASK) | ((u64)hi << 32);
+    if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
+         PCI_BASE_ADDRESS_MEM_TYPE_64 )
+        ++*index;
+
+    return 0;
+}
+
 int pci_add_device(u16 seg, u8 bus, u8 devfn,
                    const struct pci_dev_info *info, nodeid_t node)
 {
@@ -651,7 +697,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
             {
                 unsigned int idx = pos + PCI_SRIOV_BAR + i * 4;
                 u32 bar = pci_conf_read32(seg, bus, slot, func, idx);
-                u32 hi = 0;
+                uint64_t addr;
 
                 if ( (bar & PCI_BASE_ADDRESS_SPACE) ==
                      PCI_BASE_ADDRESS_SPACE_IO )
@@ -662,38 +708,13 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
                            seg, bus, slot, func, i);
                     continue;
                 }
-                pci_conf_write32(seg, bus, slot, func, idx, ~0);
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                {
-                    if ( i >= PCI_SRIOV_NUM_BARS )
-                    {
+                ret = pci_size_bar(seg, bus, slot, func, pos + PCI_SRIOV_BAR,
+                                   PCI_SRIOV_NUM_BARS, &i, &addr,
+                                   &pdev->vf_rlen[i]);
+                if ( ret )
                         printk(XENLOG_WARNING
-                               "SR-IOV device %04x:%02x:%02x.%u with 64-bit"
-                               " vf BAR in last slot\n",
-                               seg, bus, slot, func);
-                        break;
-                    }
-                    hi = pci_conf_read32(seg, bus, slot, func, idx + 4);
-                    pci_conf_write32(seg, bus, slot, func, idx + 4, ~0);
-                }
-                pdev->vf_rlen[i] = pci_conf_read32(seg, bus, slot, func, idx) &
-                                   PCI_BASE_ADDRESS_MEM_MASK;
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                {
-                    pdev->vf_rlen[i] |= (u64)pci_conf_read32(seg, bus,
-                                                             slot, func,
-                                                             idx + 4) << 32;
-                    pci_conf_write32(seg, bus, slot, func, idx + 4, hi);
-                }
-                else if ( pdev->vf_rlen[i] )
-                    pdev->vf_rlen[i] |= (u64)~0 << 32;
-                pci_conf_write32(seg, bus, slot, func, idx, bar);
-                pdev->vf_rlen[i] = -pdev->vf_rlen[i];
-                if ( (bar & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
-                     PCI_BASE_ADDRESS_MEM_TYPE_64 )
-                    ++i;
+                    "failed to size BAR%u of SR-IOV device %04x:%02x:%02x\n",
+                    i, seg, bus, slot, func);
             }
         }
         else
@@ -723,6 +744,74 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
 
     pci_enable_acs(pdev);
 
+    if ( is_hvm_domain(current->domain) )
+    {
+        unsigned int i, num_bars;
+        u8 htype;
+
+        ASSERT(is_hardware_domain(current->domain));
+
+        htype = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE);
+
+        switch ( htype )
+        {
+            case PCI_HEADER_TYPE_NORMAL:
+                num_bars = 6;
+                break;
+            case PCI_HEADER_TYPE_BRIDGE:
+                num_bars = 2;
+                break;
+            default:
+                printk(XENLOG_WARNING
+                       "PCI device %04x:%02x:%02x.%u type %u not supported\n",
+                       seg, bus, slot, func, htype);
+                num_bars = 0;
+                break;
+        }
+
+        for ( i = 0; i < num_bars; i++ )
+        {
+            u32 bar, nr_pages;
+            u64 addr, size, bar_pfn;
+
+            bar = pci_conf_read32(seg, bus, slot, func,
+                                  PCI_BASE_ADDRESS_0 + i * 4);
+
+            if ( (bar & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
+            {
+                printk(XENLOG_DEBUG "PCI %04x:%02x:%02x.%u: found IO BAR#%u\n",
+                       seg, bus, slot, func, i);
+                continue;
+            }
+
+            ret = pci_size_bar(seg, bus, slot, func, PCI_BASE_ADDRESS_0,
+                               num_bars, &i, &addr, &size);
+            if ( ret )
+            {
+                printk(XENLOG_WARNING
+                       "failed to size BAR%u of device %04x:%02x:%02x.%u\n",
+                       i, seg, bus, slot, func);
+                continue;
+            }
+
+            printk(XENLOG_DEBUG
+                "PCI %04x:%02x:%02x.%u: adding BAR#%u addr %#lx size %#lx\n",
+                seg, bus, slot, func, i, addr, size);
+
+            nr_pages = DIV_ROUND_UP(size, PAGE_SIZE);
+            bar_pfn = PFN_DOWN(addr);
+            ret = map_mmio_regions(current->domain, _gfn(bar_pfn), nr_pages,
+                                   _mfn(bar_pfn));
+            if ( ret )
+            {
+                printk(XENLOG_WARNING
+                    "Failed to map %#lx - %#lx into Dom0 memory map: %d\n",
+                    bar_pfn, bar_pfn + nr_pages, ret);
+                return ret;
+            }
+        }
+    }
+
 out:
     pcidevs_unlock();
     if ( !ret )
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH RFC 12/12] xen/x86: route legacy PCI interrupts to Dom0
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (10 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 11/12] xen/x86: allow a PVHv2 Dom0 to register PCI devices with Xen Roger Pau Monne
@ 2016-07-29 16:29 ` Roger Pau Monne
  2016-07-29 16:38 ` [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
  2016-09-26 16:25 ` Jan Beulich
  13 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:29 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Paul Durrant, Jan Beulich, Roger Pau Monne

This is done adding some Dom0 specific logic to the IO APIC emulation inside
of Xen, so that writes to the IO APIC registers that should unmask an
interrupt will take care of setting up this interrupt with Xen. A Dom0
specific EIO handler also has to be used, since Xen doesn't know the
topology of the PCI devices and it just has to passthrough what Dom0 does.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/hvm/irq.c       |   9 +++
 xen/arch/x86/hvm/vioapic.c   |  28 ++++++++-
 xen/arch/x86/physdev.c       |   6 +-
 xen/drivers/passthrough/io.c | 144 ++++++++++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/io.h |   2 +
 xen/include/asm-x86/irq.h    |   5 ++
 xen/include/xen/hvm/irq.h    |   3 +
 xen/include/xen/iommu.h      |   1 +
 8 files changed, 178 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 5323d7c..be9b648 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -88,6 +88,15 @@ void hvm_pci_intx_assert(
     spin_unlock(&d->arch.hvm_domain.irq_lock);
 }
 
+void hvm_hw_gsi_assert(struct domain *d, unsigned int gsi)
+{
+
+    ASSERT(is_hardware_domain(d));
+    spin_lock(&d->arch.hvm_domain.irq_lock);
+    assert_gsi(d, gsi);
+    spin_unlock(&d->arch.hvm_domain.irq_lock);
+}
+
 static void __hvm_pci_intx_deassert(
     struct domain *d, unsigned int device, unsigned int intx)
 {
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 611be87..18305be 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -148,6 +148,29 @@ static void vioapic_write_redirent(
         unmasked = unmasked && !ent.fields.mask;
     }
 
+    if ( is_hardware_domain(d) && unmasked )
+    {
+        int ret, gsi;
+
+        /* Interrupt has been unmasked */
+        gsi = idx;
+        ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity);
+        if ( ret && ret != -EEXIST )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "%s: error registering GSI %d\n", __func__, ret);
+        }
+        if ( !ret )
+        {
+            ret = physdev_map_pirq(DOMID_SELF, MAP_PIRQ_TYPE_GSI, &gsi, &gsi,
+                                   NULL);
+            BUG_ON(ret);
+
+            ret = pt_irq_bind_hw_domain(gsi);
+            BUG_ON(ret);
+        }
+    }
+
     *pent = ent;
 
     if ( idx == 0 )
@@ -409,7 +432,10 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
         if ( iommu_enabled )
         {
             spin_unlock(&d->arch.hvm_domain.irq_lock);
-            hvm_dpci_eoi(d, gsi, ent);
+            if ( is_hardware_domain(d) )
+                hvm_hw_dpci_eoi(d, gsi, ent);
+            else
+                hvm_dpci_eoi(d, gsi, ent);
             spin_lock(&d->arch.hvm_domain.irq_lock);
         }
 
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 5a49796..8b13a36 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -19,10 +19,6 @@
 #include <xsm/xsm.h>
 #include <asm/p2m.h>
 
-int physdev_map_pirq(domid_t, int type, int *index, int *pirq_p,
-                     struct msi_info *);
-int physdev_unmap_pirq(domid_t, int pirq);
-
 #include "x86_64/mmconfig.h"
 
 #ifndef COMPAT
@@ -94,7 +90,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p,
     int pirq, irq, ret = 0;
     void *map_data = NULL;
 
-    if ( domid == DOMID_SELF && is_hvm_domain(d) )
+    if ( domid == DOMID_SELF && !is_hardware_domain(d) && is_hvm_domain(d) )
     {
         /*
          * Only makes sense for vector-based callback, else HVM-IRQ logic
diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index 9e6b46c..56af98c 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -159,26 +159,29 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci,
 static void pt_irq_time_out(void *data)
 {
     struct hvm_pirq_dpci *irq_map = data;
-    const struct hvm_irq_dpci *dpci;
     const struct dev_intx_gsi_link *digl;
 
     spin_lock(&irq_map->dom->event_lock);
 
-    dpci = domain_get_irq_dpci(irq_map->dom);
-    ASSERT(dpci);
-    list_for_each_entry ( digl, &irq_map->digl_list, list )
+    if ( !is_hardware_domain(irq_map->dom) )
     {
-        unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
-        const struct hvm_girq_dpci_mapping *girq;
-
-        list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
+        const struct hvm_irq_dpci *dpci = domain_get_irq_dpci(irq_map->dom);
+        ASSERT(dpci);
+        list_for_each_entry ( digl, &irq_map->digl_list, list )
         {
-            struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);
+            unsigned int guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
+            const struct hvm_girq_dpci_mapping *girq;
+
+            list_for_each_entry ( girq, &dpci->girq[guest_gsi], list )
+            {
+                struct pirq *pirq = pirq_info(irq_map->dom, girq->machine_gsi);
 
-            pirq_dpci(pirq)->flags |= HVM_IRQ_DPCI_EOI_LATCH;
+                pirq_dpci(pirq)->flags |= HVM_IRQ_DPCI_EOI_LATCH;
+            }
+            hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
         }
-        hvm_pci_intx_deassert(irq_map->dom, digl->device, digl->intx);
-    }
+    } else
+        irq_map->flags |= HVM_IRQ_DPCI_EOI_LATCH;
 
     pt_pirq_iterate(irq_map->dom, pt_irq_guest_eoi, NULL);
 
@@ -557,6 +560,85 @@ int pt_irq_create_bind(
     return 0;
 }
 
+int pt_irq_bind_hw_domain(int gsi)
+{
+    struct domain *d = hardware_domain;
+    struct hvm_pirq_dpci *pirq_dpci;
+    struct hvm_irq_dpci *hvm_irq_dpci;
+    struct pirq *info;
+    int rc;
+
+    if ( gsi < 0 || gsi >= d->nr_pirqs )
+        return -EINVAL;
+
+restart:
+    spin_lock(&d->event_lock);
+
+    hvm_irq_dpci = domain_get_irq_dpci(d);
+    if ( hvm_irq_dpci == NULL )
+    {
+        unsigned int i;
+
+        hvm_irq_dpci = xzalloc(struct hvm_irq_dpci);
+        if ( hvm_irq_dpci == NULL )
+        {
+            spin_unlock(&d->event_lock);
+            return -ENOMEM;
+        }
+        for ( i = 0; i < NR_HVM_IRQS; i++ )
+            INIT_LIST_HEAD(&hvm_irq_dpci->girq[i]);
+
+        d->arch.hvm_domain.irq.dpci = hvm_irq_dpci;
+    }
+
+    info = pirq_get_info(d, gsi);
+    if ( !info )
+    {
+        spin_unlock(&d->event_lock);
+        return -ENOMEM;
+    }
+    pirq_dpci = pirq_dpci(info);
+
+    /*
+     * A crude 'while' loop with us dropping the spinlock and giving
+     * the softirq_dpci a chance to run.
+     * We MUST check for this condition as the softirq could be scheduled
+     * and hasn't run yet. Note that this code replaced tasklet_kill which
+     * would have spun forever and would do the same thing (wait to flush out
+     * outstanding hvm_dirq_assist calls.
+     */
+    if ( pt_pirq_softirq_active(pirq_dpci) )
+    {
+        spin_unlock(&d->event_lock);
+        cpu_relax();
+        goto restart;
+    }
+
+    pirq_dpci->dom = d;
+    pirq_dpci->flags = HVM_IRQ_DPCI_MAPPED |
+                       HVM_IRQ_DPCI_MACH_PCI |
+                       HVM_IRQ_DPCI_GUEST_PCI;
+
+    /* Init timer before binding */
+    if ( pt_irq_need_timer(pirq_dpci->flags) )
+        init_timer(&pirq_dpci->timer, pt_irq_time_out, pirq_dpci, 0);
+
+    rc = pirq_guest_bind(d->vcpu[0], info, gsi > 15 ? BIND_PIRQ__WILL_SHARE :
+                                                      0);
+    if ( unlikely(rc) )
+    {
+        if ( pt_irq_need_timer(pirq_dpci->flags) )
+            kill_timer(&pirq_dpci->timer);
+        pirq_dpci->dom = NULL;
+        pirq_cleanup_check(info, d);
+        spin_unlock(&d->event_lock);
+        return rc;
+    }
+
+    spin_unlock(&d->event_lock);
+    return 0;
+}
+
 int pt_irq_destroy_bind(
     struct domain *d, xen_domctl_bind_pt_irq_t *pt_irq_bind)
 {
@@ -819,11 +901,19 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
             return;
         }
 
-        list_for_each_entry ( digl, &pirq_dpci->digl_list, list )
+        if ( is_hardware_domain(d) )
         {
-            hvm_pci_intx_assert(d, digl->device, digl->intx);
+            hvm_hw_gsi_assert(d, pirq->pirq);
             pirq_dpci->pending++;
         }
+        else
+        {
+            list_for_each_entry ( digl, &pirq_dpci->digl_list, list )
+            {
+                hvm_pci_intx_assert(d, digl->device, digl->intx);
+                pirq_dpci->pending++;
+            }
+        }
 
         if ( pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE )
         {
@@ -899,6 +989,32 @@ unlock:
     spin_unlock(&d->event_lock);
 }
 
+void hvm_hw_dpci_eoi(struct domain *d, unsigned int gsi,
+                     const union vioapic_redir_entry *ent)
+{
+    struct pirq *pirq = pirq_info(d, gsi);
+    struct hvm_pirq_dpci *pirq_dpci;
+
+    ASSERT(is_hardware_domain(d) && iommu_enabled);
+
+    if ( pirq == NULL )
+        return;
+
+    pirq_dpci = pirq_dpci(pirq);
+    ASSERT(pirq_dpci != NULL);
+
+    spin_lock(&d->event_lock);
+    if ( --pirq_dpci->pending || (ent && ent->fields.mask) ||
+         !pt_irq_need_timer(pirq_dpci->flags) )
+        goto unlock;
+
+    stop_timer(&pirq_dpci->timer);
+    pirq_guest_eoi(pirq);
+
+unlock:
+    spin_unlock(&d->event_lock);
+}
+
 /*
  * Note: 'pt_pirq_softirq_reset' can clear the STATE_SCHED before we get to
  * doing it. If that is the case we let 'pt_pirq_softirq_reset' do ref-counting.
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index e9b3f83..8de4a8f 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -125,6 +125,8 @@ int handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
 void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
                   const union vioapic_redir_entry *ent);
+void hvm_hw_dpci_eoi(struct domain *d, unsigned int gsi,
+                     const union vioapic_redir_entry *ent);
 void msix_write_completion(struct vcpu *);
 void msixtbl_init(struct domain *d);
 
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 7efdd37..07f21ab 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -201,4 +201,9 @@ bool_t cpu_has_pending_apic_eoi(void);
 
 static inline void arch_move_irqs(struct vcpu *v) { }
 
+struct msi_info;
+int physdev_map_pirq(domid_t, int type, int *index, int *pirq_p,
+                     struct msi_info *);
+int physdev_unmap_pirq(domid_t, int pirq);
+
 #endif /* _ASM_HW_IRQ_H */
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 4c9cb20..2ffaf35 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -122,6 +122,9 @@ void hvm_isa_irq_assert(
 void hvm_isa_irq_deassert(
     struct domain *d, unsigned int isa_irq);
 
+/* Modify state of a hardware domain GSI */
+void hvm_hw_gsi_assert(struct domain *d, unsigned int gsi);
+
 void hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq);
 
 int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 5803e3f..07c6c40 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -114,6 +114,7 @@ struct pirq;
 int hvm_do_IRQ_dpci(struct domain *, struct pirq *);
 int pt_irq_create_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
 int pt_irq_destroy_bind(struct domain *, xen_domctl_bind_pt_irq_t *);
+int pt_irq_bind_hw_domain(int gsi);
 
 void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq);
 struct hvm_irq_dpci *domain_get_irq_dpci(const struct domain *);
-- 
2.7.4 (Apple Git-66)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] PVHv2 Dom0
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (11 preceding siblings ...)
  2016-07-29 16:29 ` [PATCH RFC 12/12] xen/x86: route legacy PCI interrupts to Dom0 Roger Pau Monne
@ 2016-07-29 16:38 ` Roger Pau Monne
  2016-09-26 16:25 ` Jan Beulich
  13 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-07-29 16:38 UTC (permalink / raw)
  To: xen-devel; +Cc: boris.ostrovsky

Hello,

Sorry, I've failed to Cc both Konrad and Boris on this...

On Fri, Jul 29, 2016 at 06:28:55PM +0200, Roger Pau Monne wrote:
> Hello,
> 
> This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
> of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
> *should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.
> 
> There are some changes compared to a traditional PV or PVH Dom0:
> 
>  - An emulated local APIC (for each vCPU) and IO APIC is provided to Dom0.
>  - At the start of day only the low 1MB and the ACPI e820 regions are mapped 
>    into the guest using 1:1 mappings (for UEFI systems mapping the low 1MB 
>    probably doesn't make any sense, but alas).
>  - The MADT has been replaced in order to reflect the real topology seen by 
>    Dom0.
>  - In order to have the BARs of PCI devices mapped Dom0 must call 
>    PHYSDEVOP_pci_device_add. See the notes on patch 11 for more information 
>    (and what's missing).
>  - ATM only legacy PCI interrupts are supported. This is implemented by 
>    looking at the writes Dom0 makes to the emulated IO APIC and translating 
>    them into the real hardware. Note that MSI or MSI-X capabilities are 
>    _not_ masked from the PCI capabilities list, so the user must avoid using
>    them.
>  - PCIe Enhanced Configuration Mechanism regions are not yet mapped into 
>    Dom0 physmap.
>  - Some ACPI tables are zapped (it's signature is inverted) to prevent Dom0 
>    from poking at them, those are: HPET, SLIT, SRAT, MPST and PMTT.
> 
> This is still very experimental, I've been able to boot a FreeBSD Dom0 using 
> 2GB and 4GB of memory assigned to it, but if I try to use 6GB Xen gets a NMI 
> (I've got no idea why yet). I don't think it's worth delaying this more, 
> because it's going to take me some time to finish all this work (MSI, 
> MSI-X, bug hunting...), and in the meantime people can already take a look 
> at what's done (or half-done).
> 
> I've pushed the whole series to my git repository:
> 
> git://xenbits.xen.org/people/royger/xen.git pvhv2_dom0_rfc
> 
> It contains two patches from Boris and Anthony, that are a pre-requisite to 
> this series.
> 
> As usual, thanks for taking the time to look into it, hope it doesn't make 
> your eyes bleed much (slight bleeding is expected).
> 
> Roger.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
@ 2016-07-29 16:47   ` Andrew Cooper
  2016-08-02  9:47     ` Roger Pau Monne
  2016-08-01  8:57   ` Tim Deegan
  1 sibling, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-07-29 16:47 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 29/07/16 17:28, Roger Pau Monne wrote:
> diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
> index 107fc8c..1b270df 100644
> --- a/xen/arch/x86/mm/paging.c
> +++ b/xen/arch/x86/mm/paging.c
> @@ -953,6 +953,22 @@ void paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
>          safe_write_pte(p, new);
>  }
>  
> +int paging_set_allocation(struct domain *d, unsigned long pages)
> +{
> +    int rc;
> +
> +    ASSERT(paging_mode_enabled(d));
> +
> +    paging_lock(d);
> +    if ( hap_enabled(d) )
> +        rc = hap_set_allocation(d, pages, NULL);
> +    else
> +        rc = sh_set_allocation(d, pages, NULL);

(without looking at the rest of the series) Your NMI is probably a
watchdog timeout from this call, as passing NULL means "non-preemptible".

As this is for the construction of dom0, it would be better to take a
preemptible pointer, loop in construct_dom0(), with a
process_pending_softirqs() call in between.

> +    paging_unlock(d);
> +
> +    return rc;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index c22362f..452e22e 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -1604,9 +1604,8 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
>   * Input will be rounded up to at least shadow_min_acceptable_pages(),
>   * plus space for the p2m table.
>   * Returns 0 for success, non-zero for failure. */
> -static unsigned int sh_set_allocation(struct domain *d,
> -                                      unsigned int pages,
> -                                      int *preempted)
> +unsigned int sh_set_allocation(struct domain *d, unsigned int pages,
> +                               int *preempted)
>  {
>      struct page_info *sp;
>      unsigned int lower_bound;
> diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
> index c613836..e3c9c98 100644
> --- a/xen/include/asm-x86/hap.h
> +++ b/xen/include/asm-x86/hap.h
> @@ -46,7 +46,8 @@ int   hap_track_dirty_vram(struct domain *d,
>                             XEN_GUEST_HANDLE_64(uint8) dirty_bitmap);
>  
>  extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
> -void hap_set_alloc_for_pvh_dom0(struct domain *d, unsigned long num_pages);
> +unsigned int hap_set_allocation(struct domain *d, unsigned int pages,
> +				int *preempted);

I also note from this change that there is an unsigned long => unsigned
int truncation in the internals of *_set_allocation().  This should
definitely be fixed, although possibly wants to be a separate change.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain
  2016-07-29 16:28 ` [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain Roger Pau Monne
@ 2016-07-29 17:50   ` Andrew Cooper
  2016-08-01 11:23     ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-07-29 17:50 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich

On 29/07/16 17:28, Roger Pau Monne wrote:
>          if ( is_hardware_domain(d) )
> -            config->emulation_flags |= XEN_X86_EMU_PIT;
> -        if ( config->emulation_flags != 0 &&
> -             (config->emulation_flags !=
> -              (is_hvm_domain(d) ? XEN_X86_EMU_ALL : XEN_X86_EMU_PIT)) )
> +            emflags |= XEN_X86_EMU_PIT;
> +
> +        if ( (is_hardware_domain(d) ?
> +              (is_hvm_domain(d) && emflags !=
> +              (XEN_X86_EMU_PIT|XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC)) :
> +              (emflags && (is_hvm_domain(d) ? (emflags != XEN_X86_EMU_ALL) :
> +                                              (emflags != XEN_X86_EMU_PIT)))) )

This is getting very complicated.

It is rather important (security wise) and not a hotpath.  Could I
please request that you move this logic to a helper such as:

static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)

and make this more readable.  Any decent compiler will inline and
simplify it appropriately.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2
  2016-07-29 16:28 ` [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
@ 2016-07-29 17:57   ` Andrew Cooper
  2016-08-01 11:36     ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-07-29 17:57 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich

On 29/07/16 17:28, Roger Pau Monne wrote:
> Split the Dom0 builder into two different functions, one for PV (and classic
> PVH), and another one for PVHv2. Introduce a new command line parameter,
> dom0hvm in order to request the creation of a PVHv2 Dom0.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/domain_build.c | 27 ++++++++++++++++++++++++++-
>  xen/arch/x86/setup.c        |  9 +++++++++

A patch to docs/misc/xen-command-line.markdown please.

>  2 files changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index 09d79be..c0ef40f 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -952,7 +952,7 @@ static int __init setup_permissions(struct domain *d)
>      return rc;
>  }
>  
> -int __init construct_dom0(
> +static int __init construct_dom0_pv(
>      struct domain *d,
>      const module_t *image, unsigned long image_headroom,
>      module_t *initrd,
> @@ -1647,6 +1647,31 @@ out:
>      return rc;
>  }
>  
> +static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
> +                                     unsigned long image_headroom,
> +                                     module_t *initrd,
> +                                     void *(*bootstrap_map)(const module_t *),
> +                                     char *cmdline)
> +{
> +
> +    printk("** Building a PVH Dom0 **\n");

Some naming curiosities here, especially given the parameter name.

> +
> +    return 0;
> +}
> +
> +int __init construct_dom0(struct domain *d, const module_t *image,
> +                          unsigned long image_headroom, module_t *initrd,
> +                          void *(*bootstrap_map)(const module_t *),
> +                          char *cmdline)
> +{
> +
> +    return is_hvm_domain(d) ?
> +            construct_dom0_hvm(d, image, image_headroom, initrd,
> +                               bootstrap_map, cmdline) :
> +            construct_dom0_pv(d, image, image_headroom, initrd, bootstrap_map,
> +                              cmdline);

This could be slightly less awkwardly split as:

(is_hvm_domain(d) ? construct_dom0_hvm : construct_dom0_pv)
(d, image, image_headroom, initrd, bootstrap_map, cmdline);

with some appropriate indentation/alignment.

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global
  2016-07-29 16:29 ` [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global Roger Pau Monne
@ 2016-07-29 17:57   ` Andrew Cooper
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2016-07-29 17:57 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 207 bytes --]

On 29/07/16 17:29, Roger Pau Monne wrote:
> So that it can be called from the Dom0 builder.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

[-- Attachment #1.2: Type: text/html, Size: 764 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-07-29 16:29 ` [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
@ 2016-07-29 19:04   ` Andrew Cooper
  2016-08-02  9:19     ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-07-29 19:04 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich

On 29/07/16 17:29, Roger Pau Monne wrote:
> Craft the Dom0 e820 memory map and populate it.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>  xen/arch/x86/domain_build.c | 199 ++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 193 insertions(+), 6 deletions(-)
>
> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> index c0ef40f..cb8ecbd 100644
> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c
> @@ -43,6 +43,11 @@ static long __initdata dom0_nrpages;
>  static long __initdata dom0_min_nrpages;
>  static long __initdata dom0_max_nrpages = LONG_MAX;
>  
> +/* GFN of the identity map for EPT. */
> +#define HVM_IDENT_PT_GFN  0xfeffeu

The IDENT_PT is only needed on Gen1 VT-x hardware which doesn't support
unrestricted-guest mode.  OTOH, it needs to be matched with VM86_TSS,
which is also needed if hardware doesn't support unrestricted-guest. 
Also, the IDENT_PT can live anywhere in GFN space.  0xfeffe is just
convention for HVM guests as nothing else interesting lives there, but a
PVH dom0 will want to ensure it doesn't alias anything interesting it
might wish to map.

Now I look at it, there is substantial WTF.  The domain builder makes
IDENT_PT, but HVMLoader makes VM86_TSS.  I presume this is a historical
side effect of IDENT_PT being a prerequisite to even executing
hvmloader, while VM86_TSS was only a prerequisite to executing the
rombios payload.  Either way, eww :(

I think the VM86_TSS setup needs to move to unconditionally being beside
IDENT_PT setup in the domain builder, and mirrored here in dom0_hvm()
creation.  I don't think it is reasonable to expect an HVMLite payload
to set up its own VM86_TSS if it didn't set up IDENT_PT.  (OTOH, the
chances of an HVMLite guest ever needing to return to real mode is slim,
but in the name of flexibility, it would be nice not to rule it out).

Longterm, it would be nice to disentangle the unrestricted-guest support
from the main vmx code, and make it able to be compiled out.  There is
lots of it, and it all-but-dead on modern hardware.
> @@ -591,6 +610,8 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
>          {
>              cur_pages += pages;
>          }
> +        ASSERT((entry_guest->addr & ~PAGE_MASK) == 0 &&
> +               (entry_guest->size & ~PAGE_MASK) == 0);

This would be clearer as ASSERT(IS_ALIGNED(entry_guest->addr, PAGE_SIZE));

(although I suspect the IS_ALIGNED() predicate is newer than your first
iteration of this code.)

>   next:
>          d->arch.nr_e820++;
>          entry_guest++;
> @@ -1631,7 +1652,7 @@ static int __init construct_dom0_pv(
>          dom0_update_physmap(d, pfn, mfn, 0);
>  
>          pvh_map_all_iomem(d, nr_pages);
> -        pvh_setup_e820(d, nr_pages);
> +        hvm_setup_e820(d, nr_pages);
>      }
>  
>      if ( d->domain_id == hardware_domid )
> @@ -1647,15 +1668,181 @@ out:
>      return rc;
>  }
>  
> +/* Helper to convert from bytes into human-readable form. */
> +static void __init pretty_print_bytes(uint64_t size)
> +{
> +    const char* units[] = {"B", "KB", "MB", "GB", "TB"};

static const char *units[] __initconst

However, this particular array would be far smaller as

static const char units[][3] __initconst = { "B", ... };

as it avoids embedding 5x8byte pointers to get at 14 useful bytes of
information.

> +    int i = 0;
> +
> +    while ( ++i < sizeof(units) && size >= 1024 )
> +        size >>= 10; /* size /= 1024 */
> +
> +    printk("%4" PRIu64 "%2s", size, units[i-1]);

Wouldn't it be better to introduce a new %p format identifier to do
this?  (Linux doesn't appear to have an existing format identifier which
we can 'borrow')

It looks potentially useful elsewhere, and avoids the awkward

printk("Order %2u: ", MAX_ORDER - i);
pretty_print_bytes(((uint64_t)1 << (MAX_ORDER - i + PAGE_SHIFT)) *
                   hvm_mem_stats[MAX_ORDER - i]);
printk("\n");

constructs you have below.


> +}
> +
> +/* Calculate the biggest usable order given a size in bytes. */
> +static inline unsigned int get_order(uint64_t size)
> +{
> +    unsigned int order;
> +    uint64_t pg;
> +
> +    ASSERT((size & ~PAGE_MASK) == 0);
> +    pg = PFN_DOWN(size);
> +    for ( order = 0; pg >= (1 << (order + 1)); order++ );
> +
> +    return order;
> +}

We already have get_order_from_bytes() and get_order_from_pages(), the
latter of which looks like it will suit your usecase.

As a TODO item, I really would like to borrow some of the Linux
infrastructure to calculate orders of constants at compile time, because
a large number of callers of the existing get_order_*() functions are
performing unnecessary runtime calculation.  For those which need
runtime calculation, some careful use of ffs() would be preferable to a
loop.

> +
> +/* Populate an HVM memory range using the biggest possible order. */
> +static void __init hvm_populate_memory_range(struct domain *d, uint64_t start,
> +                                             uint64_t size)
> +{
> +    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
> +    unsigned int order;
> +    struct page_info *page;
> +    int rc;
> +
> +    ASSERT((size & ~PAGE_MASK) == 0 && (start & ~PAGE_MASK) == 0);
> +
> +    order = MAX_ORDER;
> +    while ( size != 0 )
> +    {
> +        order = min(get_order(size), order);
> +        page = alloc_domheap_pages(d, order, memflags);
> +        if ( page == NULL )
> +        {
> +            if ( order == 0 && memflags )
> +            {
> +                /* Try again without any memflags. */
> +                memflags = 0;
> +                order = MAX_ORDER;
> +                continue;
> +            }
> +            if ( order == 0 )
> +                panic("Unable to allocate memory with order 0!\n");
> +            order--;
> +            continue;
> +        }

It would be far more efficient to try and allocate only 1G and 2M
blocks.  Most of memory is free at this point, and it would allow the
use of HAP superpage mappings, which will be a massive performance boost
if they aren't shattered.

> +
> +        hvm_mem_stats[order]++;
> +        rc = guest_physmap_add_page(d, _gfn(PFN_DOWN(start)),
> +                                    _mfn(page_to_mfn(page)), order);
> +        if ( rc != 0 )
> +            panic("Failed to populate memory: [%" PRIx64 " - %" PRIx64 "] %d\n",
> +                  start, start + (((uint64_t)1) << (order + PAGE_SHIFT)), rc);
> +        start += ((uint64_t)1) << (order + PAGE_SHIFT);
> +        size -= ((uint64_t)1) << (order + PAGE_SHIFT);
> +        if ( (size & 0xffffffff) == 0 )
> +            process_pending_softirqs();
> +    }
> +
> +}
> +
> +static int __init hvm_setup_p2m(struct domain *d)
> +{
> +    struct vcpu *v = d->vcpu[0];
> +    unsigned long nr_pages;
> +    int i;
> +
> +    printk("** Preparing memory map **\n");
> +
> +    /*
> +     * Subtract one page for the EPT identity page table and two pages
> +     * for the MADT replacement.
> +     */
> +    nr_pages = compute_dom0_nr_pages(d, NULL, 0) - 3;
> +
> +    hvm_setup_e820(d, nr_pages);
> +    paging_set_allocation(d, dom0_paging_pages(d, nr_pages));
> +
> +    printk("Dom0 memory map:\n");
> +    print_e820_memory_map(d->arch.e820, d->arch.nr_e820);
> +
> +    printk("** Populating memory map **\n");
> +    /* Populate memory map. */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        if ( d->arch.e820[i].type != E820_RAM )
> +            continue;
> +
> +        hvm_populate_memory_range(d, d->arch.e820[i].addr,
> +                                  d->arch.e820[i].size);
> +    }
> +
> +    printk("Memory allocation stats:\n");
> +    for ( i = 0; i <= MAX_ORDER; i++ )
> +    {
> +        if ( hvm_mem_stats[MAX_ORDER - i] != 0 )
> +        {
> +            printk("Order %2u: ", MAX_ORDER - i);
> +            pretty_print_bytes(((uint64_t)1 << (MAX_ORDER - i + PAGE_SHIFT)) *
> +                               hvm_mem_stats[MAX_ORDER - i]);
> +            printk("\n");
> +        }
> +    }
> +
> +    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
> +    {
> +        struct vcpu *saved_current;
> +        struct page_info *page;
> +        uint32_t *ident_pt;
> +
> +        /*
> +         * Identity-map page table is required for running with CR0.PG=0
> +         * when using Intel EPT. Create a 32-bit non-PAE page directory of
> +         * superpages.
> +         */
> +        page = alloc_domheap_pages(d, 0, 0);
> +        if ( unlikely(!page) )
> +        {
> +            printk("Unable to allocate page for identity map\n");
> +            return -ENOMEM;
> +        }
> +
> +        saved_current = current;
> +        set_current(v);

What is this swizzle of current for?  I presume you hit an assertion
somewhere?  Given how late we are in the boot process, it might be worth
formally context switching to d0v0 earlier.

> +
> +        ident_pt = __map_domain_page(page);
> +        for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
> +            ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
> +                           _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
> +        unmap_domain_page(ident_pt);
> +
> +        guest_physmap_add_page(d, _gfn(HVM_IDENT_PT_GFN),
> +                               _mfn(page_to_mfn(page)), 0);

Please pick an existing gfn and map_domain_gfn() instead.  This will
avoid an unnecessary shattering of hap superpages.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
  2016-07-29 16:47   ` Andrew Cooper
@ 2016-08-01  8:57   ` Tim Deegan
  1 sibling, 0 replies; 49+ messages in thread
From: Tim Deegan @ 2016-08-01  8:57 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: George Dunlap, xen-devel, Jan Beulich, Andrew Cooper

Hi,

At 18:28 +0200 on 29 Jul (1469816936), Roger Pau Monne wrote:
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -1604,9 +1604,8 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
>   * Input will be rounded up to at least shadow_min_acceptable_pages(),
>   * plus space for the p2m table.
>   * Returns 0 for success, non-zero for failure. */
> -static unsigned int sh_set_allocation(struct domain *d,
> -                                      unsigned int pages,
> -                                      int *preempted)
> +unsigned int sh_set_allocation(struct domain *d, unsigned int pages,
> +                               int *preempted)

Making this non-static is fine by me.  Since you'll be respinning
anyway for Andrew's comments about can you please also move the full
comment above it to shadow.h so callers can see the rounding of inputs
&c.  It might be good to repeat some of that in the comment for
paging_set_allocation() too.

With that, Acked-by: Tim Deegan <tim@xen.org>

Cheers,

Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain
  2016-07-29 17:50   ` Andrew Cooper
@ 2016-08-01 11:23     ` Roger Pau Monne
  0 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-01 11:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Fri, Jul 29, 2016 at 06:50:48PM +0100, Andrew Cooper wrote:
> On 29/07/16 17:28, Roger Pau Monne wrote:
> >          if ( is_hardware_domain(d) )
> > -            config->emulation_flags |= XEN_X86_EMU_PIT;
> > -        if ( config->emulation_flags != 0 &&
> > -             (config->emulation_flags !=
> > -              (is_hvm_domain(d) ? XEN_X86_EMU_ALL : XEN_X86_EMU_PIT)) )
> > +            emflags |= XEN_X86_EMU_PIT;
> > +
> > +        if ( (is_hardware_domain(d) ?
> > +              (is_hvm_domain(d) && emflags !=
> > +              (XEN_X86_EMU_PIT|XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC)) :
> > +              (emflags && (is_hvm_domain(d) ? (emflags != XEN_X86_EMU_ALL) :
> > +                                              (emflags != XEN_X86_EMU_PIT)))) )
> 
> This is getting very complicated.
> 
> It is rather important (security wise) and not a hotpath.  Could I
> please request that you move this logic to a helper such as:
> 
> static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
> 
> and make this more readable.  Any decent compiler will inline and
> simplify it appropriately.

Thanks, I've now moved it to a helper as suggested:

static bool emulation_flags_ok(const struct domain *d, uint32_t emflags)
{

    if ( is_hvm_domain(d) )
    {
        if ( is_hardware_domain(d) &&
             emflags != (XEN_X86_EMU_PIT|XEN_X86_EMU_LAPIC|XEN_X86_EMU_IOAPIC))
            return false;
        if ( !is_hardware_domain(d) &&
             emflags != XEN_X86_EMU_ALL && emflags != 0 )
            return false;
    }
    else
    {
        /* PV or classic PVH. */
        if ( is_hardware_domain(d) && emflags != XEN_X86_EMU_PIT )
            return false;
        if ( !is_hardware_domain(d) && emflags != 0 )
            return false;
    }
    return true;
}

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2
  2016-07-29 17:57   ` Andrew Cooper
@ 2016-08-01 11:36     ` Roger Pau Monne
  2016-08-04 18:28       ` Andrew Cooper
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-01 11:36 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Fri, Jul 29, 2016 at 06:57:08PM +0100, Andrew Cooper wrote:
> On 29/07/16 17:28, Roger Pau Monne wrote:
> > Split the Dom0 builder into two different functions, one for PV (and classic
> > PVH), and another one for PVHv2. Introduce a new command line parameter,
> > dom0hvm in order to request the creation of a PVHv2 Dom0.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  xen/arch/x86/domain_build.c | 27 ++++++++++++++++++++++++++-
> >  xen/arch/x86/setup.c        |  9 +++++++++
> 
> A patch to docs/misc/xen-command-line.markdown please.

OK, I wasn't really sure if we want to introduce a new command line option, 
or just use dom0pvh. In any case I will add it, we can always alias dom0pvh 
to dom0pvh (or the other way around) when classic PVH support is 
removed.
 
> >  2 files changed, 35 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> > index 09d79be..c0ef40f 100644
> > --- a/xen/arch/x86/domain_build.c
> > +++ b/xen/arch/x86/domain_build.c
> > @@ -952,7 +952,7 @@ static int __init setup_permissions(struct domain *d)
> >      return rc;
> >  }
> >  
> > -int __init construct_dom0(
> > +static int __init construct_dom0_pv(
> >      struct domain *d,
> >      const module_t *image, unsigned long image_headroom,
> >      module_t *initrd,
> > @@ -1647,6 +1647,31 @@ out:
> >      return rc;
> >  }
> >  
> > +static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
> > +                                     unsigned long image_headroom,
> > +                                     module_t *initrd,
> > +                                     void *(*bootstrap_map)(const module_t *),
> > +                                     char *cmdline)
> > +{
> > +
> > +    printk("** Building a PVH Dom0 **\n");
> 
> Some naming curiosities here, especially given the parameter name.

Hm, AFAIK we agreed on keeping the 'PVH' naming, but since internally Xen 
has no concept of 'PVH' I think the constructor is better named as HVM (and 
in fact if PVH wasn't there before I would just consider this a HVM Dom0).

If people prefer HVM I can certainly change it, but I think it's going to 
get messy.

> > +
> > +    return 0;
> > +}
> > +
> > +int __init construct_dom0(struct domain *d, const module_t *image,
> > +                          unsigned long image_headroom, module_t *initrd,
> > +                          void *(*bootstrap_map)(const module_t *),
> > +                          char *cmdline)
> > +{
> > +
> > +    return is_hvm_domain(d) ?
> > +            construct_dom0_hvm(d, image, image_headroom, initrd,
> > +                               bootstrap_map, cmdline) :
> > +            construct_dom0_pv(d, image, image_headroom, initrd, bootstrap_map,
> > +                              cmdline);
> 
> This could be slightly less awkwardly split as:
> 
> (is_hvm_domain(d) ? construct_dom0_hvm : construct_dom0_pv)
> (d, image, image_headroom, initrd, bootstrap_map, cmdline);
> 
> with some appropriate indentation/alignment.

Done, thanks!

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-07-29 19:04   ` Andrew Cooper
@ 2016-08-02  9:19     ` Roger Pau Monne
  2016-08-04 18:43       ` Andrew Cooper
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-02  9:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote:
> On 29/07/16 17:29, Roger Pau Monne wrote:
> > Craft the Dom0 e820 memory map and populate it.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> >  xen/arch/x86/domain_build.c | 199 ++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 193 insertions(+), 6 deletions(-)
> >
> > diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
> > index c0ef40f..cb8ecbd 100644
> > --- a/xen/arch/x86/domain_build.c
> > +++ b/xen/arch/x86/domain_build.c
> > @@ -43,6 +43,11 @@ static long __initdata dom0_nrpages;
> >  static long __initdata dom0_min_nrpages;
> >  static long __initdata dom0_max_nrpages = LONG_MAX;
> >  
> > +/* GFN of the identity map for EPT. */
> > +#define HVM_IDENT_PT_GFN  0xfeffeu
> 
> The IDENT_PT is only needed on Gen1 VT-x hardware which doesn't support
> unrestricted-guest mode.  OTOH, it needs to be matched with VM86_TSS,
> which is also needed if hardware doesn't support unrestricted-guest. 
> Also, the IDENT_PT can live anywhere in GFN space.  0xfeffe is just
> convention for HVM guests as nothing else interesting lives there, but a
> PVH dom0 will want to ensure it doesn't alias anything interesting it
> might wish to map.
> 
> Now I look at it, there is substantial WTF.  The domain builder makes
> IDENT_PT, but HVMLoader makes VM86_TSS.  I presume this is a historical
> side effect of IDENT_PT being a prerequisite to even executing
> hvmloader, while VM86_TSS was only a prerequisite to executing the
> rombios payload.  Either way, eww :(
> 
> I think the VM86_TSS setup needs to move to unconditionally being beside
> IDENT_PT setup in the domain builder, and mirrored here in dom0_hvm()
> creation.  I don't think it is reasonable to expect an HVMLite payload
> to set up its own VM86_TSS if it didn't set up IDENT_PT.  (OTOH, the
> chances of an HVMLite guest ever needing to return to real mode is slim,
> but in the name of flexibility, it would be nice not to rule it out).
> 
> Longterm, it would be nice to disentangle the unrestricted-guest support
> from the main vmx code, and make it able to be compiled out.  There is
> lots of it, and it all-but-dead on modern hardware.

Thanks! I didn't know anything about the VM86 TSS stuff, the fact that the 
identity page tables and the VM86 TSS are set at completely different points 
makes it quite hard to follow :/.

I've now moved the setup of the VM86 TSS inside the domain builder for both 
DomU and Dom0.

> > @@ -591,6 +610,8 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
> >          {
> >              cur_pages += pages;
> >          }
> > +        ASSERT((entry_guest->addr & ~PAGE_MASK) == 0 &&
> > +               (entry_guest->size & ~PAGE_MASK) == 0);
> 
> This would be clearer as ASSERT(IS_ALIGNED(entry_guest->addr, PAGE_SIZE));
> 
> (although I suspect the IS_ALIGNED() predicate is newer than your first
> iteration of this code.)
> 
> >   next:
> >          d->arch.nr_e820++;
> >          entry_guest++;
> > @@ -1631,7 +1652,7 @@ static int __init construct_dom0_pv(
> >          dom0_update_physmap(d, pfn, mfn, 0);
> >  
> >          pvh_map_all_iomem(d, nr_pages);
> > -        pvh_setup_e820(d, nr_pages);
> > +        hvm_setup_e820(d, nr_pages);
> >      }
> >  
> >      if ( d->domain_id == hardware_domid )
> > @@ -1647,15 +1668,181 @@ out:
> >      return rc;
> >  }
> >  
> > +/* Helper to convert from bytes into human-readable form. */
> > +static void __init pretty_print_bytes(uint64_t size)
> > +{
> > +    const char* units[] = {"B", "KB", "MB", "GB", "TB"};
> 
> static const char *units[] __initconst
> 
> However, this particular array would be far smaller as
> 
> static const char units[][3] __initconst = { "B", ... };
> 
> as it avoids embedding 5x8byte pointers to get at 14 useful bytes of
> information.
> 
> > +    int i = 0;
> > +
> > +    while ( ++i < sizeof(units) && size >= 1024 )
> > +        size >>= 10; /* size /= 1024 */
> > +
> > +    printk("%4" PRIu64 "%2s", size, units[i-1]);
> 
> Wouldn't it be better to introduce a new %p format identifier to do
> this?  (Linux doesn't appear to have an existing format identifier which
> we can 'borrow')
> 
> It looks potentially useful elsewhere, and avoids the awkward
> 
> printk("Order %2u: ", MAX_ORDER - i);
> pretty_print_bytes(((uint64_t)1 << (MAX_ORDER - i + PAGE_SHIFT)) *
>                    hvm_mem_stats[MAX_ORDER - i]);
> printk("\n");
> 
> constructs you have below.

Done all the above.
 
> 
> > +}
> > +
> > +/* Calculate the biggest usable order given a size in bytes. */
> > +static inline unsigned int get_order(uint64_t size)
> > +{
> > +    unsigned int order;
> > +    uint64_t pg;
> > +
> > +    ASSERT((size & ~PAGE_MASK) == 0);
> > +    pg = PFN_DOWN(size);
> > +    for ( order = 0; pg >= (1 << (order + 1)); order++ );
> > +
> > +    return order;
> > +}
> 
> We already have get_order_from_bytes() and get_order_from_pages(), the
> latter of which looks like it will suit your usecase.

Not really, or at least they don't do the same as get_order. This function 
calculates the maximum order you can use so that there are no pages left 
over, (ie: if you have a size of 3145728bytes (3MiB), this function will 
return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I 
don't really understand while other places in code request bigger orders and 
then free the leftovers, isn't this also causing memory shattering?

> As a TODO item, I really would like to borrow some of the Linux
> infrastructure to calculate orders of constants at compile time, because
> a large number of callers of the existing get_order_*() functions are
> performing unnecessary runtime calculation.  For those which need
> runtime calculation, some careful use of ffs() would be preferable to a
> loop.
> 
> > +
> > +/* Populate an HVM memory range using the biggest possible order. */
> > +static void __init hvm_populate_memory_range(struct domain *d, uint64_t start,
> > +                                             uint64_t size)
> > +{
> > +    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
> > +    unsigned int order;
> > +    struct page_info *page;
> > +    int rc;
> > +
> > +    ASSERT((size & ~PAGE_MASK) == 0 && (start & ~PAGE_MASK) == 0);
> > +
> > +    order = MAX_ORDER;
> > +    while ( size != 0 )
> > +    {
> > +        order = min(get_order(size), order);
> > +        page = alloc_domheap_pages(d, order, memflags);
> > +        if ( page == NULL )
> > +        {
> > +            if ( order == 0 && memflags )
> > +            {
> > +                /* Try again without any memflags. */
> > +                memflags = 0;
> > +                order = MAX_ORDER;
> > +                continue;
> > +            }
> > +            if ( order == 0 )
> > +                panic("Unable to allocate memory with order 0!\n");
> > +            order--;
> > +            continue;
> > +        }
> 
> It would be far more efficient to try and allocate only 1G and 2M
> blocks.  Most of memory is free at this point, and it would allow the
> use of HAP superpage mappings, which will be a massive performance boost
> if they aren't shattered.

That's what I'm trying to do, but we might have to use pages of lower order 
when filling the smaller gaps. As an example, this are the stats when 
building a domain with 6048M of RAM:

(XEN) Memory allocation stats:
(XEN) Order 18: 5GB
(XEN) Order 17: 512MB
(XEN) Order 15: 256MB
(XEN) Order 14: 128MB
(XEN) Order 12: 16MB
(XEN) Order 10: 8MB
(XEN) Order  9: 4MB
(XEN) Order  8: 2MB
(XEN) Order  7: 1MB
(XEN) Order  6: 512KB
(XEN) Order  5: 256KB
(XEN) Order  4: 128KB
(XEN) Order  3: 64KB
(XEN) Order  2: 32KB
(XEN) Order  1: 16KB
(XEN) Order  0: 4KB

IMHO, they are quite good.

> > +
> > +        hvm_mem_stats[order]++;
> > +        rc = guest_physmap_add_page(d, _gfn(PFN_DOWN(start)),
> > +                                    _mfn(page_to_mfn(page)), order);
> > +        if ( rc != 0 )
> > +            panic("Failed to populate memory: [%" PRIx64 " - %" PRIx64 "] %d\n",
> > +                  start, start + (((uint64_t)1) << (order + PAGE_SHIFT)), rc);
> > +        start += ((uint64_t)1) << (order + PAGE_SHIFT);
> > +        size -= ((uint64_t)1) << (order + PAGE_SHIFT);
> > +        if ( (size & 0xffffffff) == 0 )
> > +            process_pending_softirqs();
> > +    }
> > +
> > +}
> > +
> > +static int __init hvm_setup_p2m(struct domain *d)
> > +{
> > +    struct vcpu *v = d->vcpu[0];
> > +    unsigned long nr_pages;
> > +    int i;
> > +
> > +    printk("** Preparing memory map **\n");
> > +
> > +    /*
> > +     * Subtract one page for the EPT identity page table and two pages
> > +     * for the MADT replacement.
> > +     */
> > +    nr_pages = compute_dom0_nr_pages(d, NULL, 0) - 3;
> > +
> > +    hvm_setup_e820(d, nr_pages);
> > +    paging_set_allocation(d, dom0_paging_pages(d, nr_pages));
> > +
> > +    printk("Dom0 memory map:\n");
> > +    print_e820_memory_map(d->arch.e820, d->arch.nr_e820);
> > +
> > +    printk("** Populating memory map **\n");
> > +    /* Populate memory map. */
> > +    for ( i = 0; i < d->arch.nr_e820; i++ )
> > +    {
> > +        if ( d->arch.e820[i].type != E820_RAM )
> > +            continue;
> > +
> > +        hvm_populate_memory_range(d, d->arch.e820[i].addr,
> > +                                  d->arch.e820[i].size);
> > +    }
> > +
> > +    printk("Memory allocation stats:\n");
> > +    for ( i = 0; i <= MAX_ORDER; i++ )
> > +    {
> > +        if ( hvm_mem_stats[MAX_ORDER - i] != 0 )
> > +        {
> > +            printk("Order %2u: ", MAX_ORDER - i);
> > +            pretty_print_bytes(((uint64_t)1 << (MAX_ORDER - i + PAGE_SHIFT)) *
> > +                               hvm_mem_stats[MAX_ORDER - i]);
> > +            printk("\n");
> > +        }
> > +    }
> > +
> > +    if ( cpu_has_vmx && paging_mode_hap(d) && !vmx_unrestricted_guest(v) )
> > +    {
> > +        struct vcpu *saved_current;
> > +        struct page_info *page;
> > +        uint32_t *ident_pt;
> > +
> > +        /*
> > +         * Identity-map page table is required for running with CR0.PG=0
> > +         * when using Intel EPT. Create a 32-bit non-PAE page directory of
> > +         * superpages.
> > +         */
> > +        page = alloc_domheap_pages(d, 0, 0);
> > +        if ( unlikely(!page) )
> > +        {
> > +            printk("Unable to allocate page for identity map\n");
> > +            return -ENOMEM;
> > +        }
> > +
> > +        saved_current = current;
> > +        set_current(v);
> 
> What is this swizzle of current for?  I presume you hit an assertion
> somewhere?  Given how late we are in the boot process, it might be worth
> formally context switching to d0v0 earlier.

Hm, probably some leftover from a previous iteration, now removed.

> > +
> > +        ident_pt = __map_domain_page(page);
> > +        for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ )
> > +            ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
> > +                           _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
> > +        unmap_domain_page(ident_pt);
> > +
> > +        guest_physmap_add_page(d, _gfn(HVM_IDENT_PT_GFN),
> > +                               _mfn(page_to_mfn(page)), 0);
> 
> Please pick an existing gfn and map_domain_gfn() instead.  This will
> avoid an unnecessary shattering of hap superpages.

Right, I've picked some of the previously mapped RAM and shrunk the e820.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-07-29 16:47   ` Andrew Cooper
@ 2016-08-02  9:47     ` Roger Pau Monne
  2016-08-02 15:49       ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-02  9:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, xen-devel, Tim Deegan, Jan Beulich

On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
> On 29/07/16 17:28, Roger Pau Monne wrote:
> > diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
> > index 107fc8c..1b270df 100644
> > --- a/xen/arch/x86/mm/paging.c
> > +++ b/xen/arch/x86/mm/paging.c
> > @@ -953,6 +953,22 @@ void paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> >          safe_write_pte(p, new);
> >  }
> >  
> > +int paging_set_allocation(struct domain *d, unsigned long pages)
> > +{
> > +    int rc;
> > +
> > +    ASSERT(paging_mode_enabled(d));
> > +
> > +    paging_lock(d);
> > +    if ( hap_enabled(d) )
> > +        rc = hap_set_allocation(d, pages, NULL);
> > +    else
> > +        rc = sh_set_allocation(d, pages, NULL);
> 
> (without looking at the rest of the series) Your NMI is probably a
> watchdog timeout from this call, as passing NULL means "non-preemptible".

I don't think so, the NMI I saw happened while the guest was booting.

> As this is for the construction of dom0, it would be better to take a
> preemptible pointer, loop in construct_dom0(), with a
> process_pending_softirqs() call in between.

Now fixed.

> > +    paging_unlock(d);
> > +
> > +    return rc;
> > +}
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> > index c22362f..452e22e 100644
> > --- a/xen/arch/x86/mm/shadow/common.c
> > +++ b/xen/arch/x86/mm/shadow/common.c
> > @@ -1604,9 +1604,8 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
> >   * Input will be rounded up to at least shadow_min_acceptable_pages(),
> >   * plus space for the p2m table.
> >   * Returns 0 for success, non-zero for failure. */
> > -static unsigned int sh_set_allocation(struct domain *d,
> > -                                      unsigned int pages,
> > -                                      int *preempted)
> > +unsigned int sh_set_allocation(struct domain *d, unsigned int pages,
> > +                               int *preempted)
> >  {
> >      struct page_info *sp;
> >      unsigned int lower_bound;
> > diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
> > index c613836..e3c9c98 100644
> > --- a/xen/include/asm-x86/hap.h
> > +++ b/xen/include/asm-x86/hap.h
> > @@ -46,7 +46,8 @@ int   hap_track_dirty_vram(struct domain *d,
> >                             XEN_GUEST_HANDLE_64(uint8) dirty_bitmap);
> >  
> >  extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
> > -void hap_set_alloc_for_pvh_dom0(struct domain *d, unsigned long num_pages);
> > +unsigned int hap_set_allocation(struct domain *d, unsigned int pages,
> > +				int *preempted);
> 
> I also note from this change that there is an unsigned long => unsigned
> int truncation in the internals of *_set_allocation().  This should
> definitely be fixed, although possibly wants to be a separate change.

Right, let me take a stab at this.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-02  9:47     ` Roger Pau Monne
@ 2016-08-02 15:49       ` Roger Pau Monne
  2016-08-02 16:12         ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-02 15:49 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, xen-devel, Tim Deegan, Jan Beulich

On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
> > On 29/07/16 17:28, Roger Pau Monne wrote:
> > > diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
> > > index 107fc8c..1b270df 100644
> > > --- a/xen/arch/x86/mm/paging.c
> > > +++ b/xen/arch/x86/mm/paging.c
> > > @@ -953,6 +953,22 @@ void paging_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn,
> > >          safe_write_pte(p, new);
> > >  }
> > >  
> > > +int paging_set_allocation(struct domain *d, unsigned long pages)
> > > +{
> > > +    int rc;
> > > +
> > > +    ASSERT(paging_mode_enabled(d));
> > > +
> > > +    paging_lock(d);
> > > +    if ( hap_enabled(d) )
> > > +        rc = hap_set_allocation(d, pages, NULL);
> > > +    else
> > > +        rc = sh_set_allocation(d, pages, NULL);
> > 
> > (without looking at the rest of the series) Your NMI is probably a
> > watchdog timeout from this call, as passing NULL means "non-preemptible".
> 
> I don't think so, the NMI I saw happened while the guest was booting.
> 
> > As this is for the construction of dom0, it would be better to take a
> > preemptible pointer, loop in construct_dom0(), with a
> > process_pending_softirqs() call in between.
> 
> Now fixed.

Hm, I have to stand corrected, using hypercall_preempt_check (as 
any of the *_set_allocation function use), is not safe at this point:

(XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
(XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
(XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
(XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
(XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
(XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen code around <ffff82d08022fd47> (hap.c#local_events_need_delivery+0x27/0x40):
(XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 c0
(XEN) Xen stack trace from rsp=ffff82d080307cc8:
(XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
(XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
(XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
(XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
(XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
(XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
(XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
(XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
(XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
(XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
(XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
(XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
(XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
(XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
(XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
(XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
(XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
(XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
(XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
(XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
(XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
(XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
(XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
(XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
(XEN)
(XEN) Pagetable walk from 0000000000000000:
(XEN)  L4[0x000] = 0000000245233063 ffffffffffffffff
(XEN)  L3[0x000] = 0000000245232063 ffffffffffffffff
(XEN)  L2[0x000] = 0000000245231063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000000
(XEN) ****************************************

I've tried setting current to d->vcpu[0], but that just makes the call to 
hypercall_preempt_check crash in some scheduler assert. In any case, I've 
added the preempt parameter to the paging_set_allocation function, but I 
don't plan to use it in the domain builder for the time being. Does that 
sound right?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-02 15:49       ` Roger Pau Monne
@ 2016-08-02 16:12         ` Jan Beulich
  2016-08-03 15:11           ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-08-02 16:12 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, xen-devel

>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>> > As this is for the construction of dom0, it would be better to take a
>> > preemptible pointer, loop in construct_dom0(), with a
>> > process_pending_softirqs() call in between.
>> 
>> Now fixed.
> 
> Hm, I have to stand corrected, using hypercall_preempt_check (as 
> any of the *_set_allocation function use), is not safe at this point:
> 
> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d08022fd47> (hap.c#local_events_need_delivery+0x27/0x40):
> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 c0
> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
> (XEN)
> (XEN) Pagetable walk from 0000000000000000:

Sadly you don't make clear what pointer it is that is NULL at that point.

> I've tried setting current to d->vcpu[0], but that just makes the call to 
> hypercall_preempt_check crash in some scheduler assert. In any case, I've 
> added the preempt parameter to the paging_set_allocation function, but I 
> don't plan to use it in the domain builder for the time being. Does that 
> sound right?

Not really, new huge latency issues like this shouldn't be reintroduced;
we've been fighting hard to get rid of those (and we still occasionally
find some no-one had noticed before).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-02 16:12         ` Jan Beulich
@ 2016-08-03 15:11           ` George Dunlap
  2016-08-03 15:25             ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2016-08-03 15:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Tim Deegan, xen-devel, Roger Pau Monne

On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>>> > As this is for the construction of dom0, it would be better to take a
>>> > preemptible pointer, loop in construct_dom0(), with a
>>> > process_pending_softirqs() call in between.
>>>
>>> Now fixed.
>>
>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>> any of the *_set_allocation function use), is not safe at this point:
>>
>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen code around <ffff82d08022fd47> (hap.c#local_events_need_delivery+0x27/0x40):
>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 c0
>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>> (XEN)
>> (XEN) Pagetable walk from 0000000000000000:
>
> Sadly you don't make clear what pointer it is that is NULL at that point.

It sounds from what he says in the following paragraph like current is NULL.

>> I've tried setting current to d->vcpu[0], but that just makes the call to
>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>> added the preempt parameter to the paging_set_allocation function, but I
>> don't plan to use it in the domain builder for the time being. Does that
>> sound right?
>
> Not really, new huge latency issues like this shouldn't be reintroduced;
> we've been fighting hard to get rid of those (and we still occasionally
> find some no-one had noticed before).

You mean latency in processing softirqs?

Maybe what we need to do is to make local_events_need_delivery() safe
to call at this point by having it return 0 if current is NULL rather
than crashing?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 15:11           ` George Dunlap
@ 2016-08-03 15:25             ` Jan Beulich
  2016-08-03 15:28               ` George Dunlap
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-08-03 15:25 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, Tim Deegan, xen-devel, Roger Pau Monne

>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>>>> > As this is for the construction of dom0, it would be better to take a
>>>> > preemptible pointer, loop in construct_dom0(), with a
>>>> > process_pending_softirqs() call in between.
>>>>
>>>> Now fixed.
>>>
>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>>> any of the *_set_allocation function use), is not safe at this point:
>>>
>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
> hap.c#local_events_need_delivery+0x27/0x40
>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen code around <ffff82d08022fd47> 
> (hap.c#local_events_need_delivery+0x27/0x40):
>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
> c0
>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>>> (XEN)
>>> (XEN) Pagetable walk from 0000000000000000:
>>
>> Sadly you don't make clear what pointer it is that is NULL at that point.
> 
> It sounds from what he says in the following paragraph like current is NULL.

I don't recall us re-setting current to this late in the boot process.
Even during early boot we set it to a bogus non-NULL value rather
than NULL.

>>> I've tried setting current to d->vcpu[0], but that just makes the call to
>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>>> added the preempt parameter to the paging_set_allocation function, but I
>>> don't plan to use it in the domain builder for the time being. Does that
>>> sound right?
>>
>> Not really, new huge latency issues like this shouldn't be reintroduced;
>> we've been fighting hard to get rid of those (and we still occasionally
>> find some no-one had noticed before).
> 
> You mean latency in processing softirqs?
> 
> Maybe what we need to do is to make local_events_need_delivery() safe
> to call at this point by having it return 0 if current is NULL rather
> than crashing?

That would have the same effect - no softirq processing, and hence
possible time issues on huge systems.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 15:25             ` Jan Beulich
@ 2016-08-03 15:28               ` George Dunlap
  2016-08-03 15:37                 ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: George Dunlap @ 2016-08-03 15:28 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap
  Cc: Andrew Cooper, Tim Deegan, xen-devel, Roger Pau Monne

On 03/08/16 16:25, Jan Beulich wrote:
>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>>>>>> As this is for the construction of dom0, it would be better to take a
>>>>>> preemptible pointer, loop in construct_dom0(), with a
>>>>>> process_pending_softirqs() call in between.
>>>>>
>>>>> Now fixed.
>>>>
>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>>>> any of the *_set_allocation function use), is not safe at this point:
>>>>
>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
>> hap.c#local_events_need_delivery+0x27/0x40
>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen code around <ffff82d08022fd47> 
>> (hap.c#local_events_need_delivery+0x27/0x40):
>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
>> c0
>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>>>> (XEN)
>>>> (XEN) Pagetable walk from 0000000000000000:
>>>
>>> Sadly you don't make clear what pointer it is that is NULL at that point.
>>
>> It sounds from what he says in the following paragraph like current is NULL.
> 
> I don't recall us re-setting current to this late in the boot process.
> Even during early boot we set it to a bogus non-NULL value rather
> than NULL.
> 
>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>>>> added the preempt parameter to the paging_set_allocation function, but I
>>>> don't plan to use it in the domain builder for the time being. Does that
>>>> sound right?
>>>
>>> Not really, new huge latency issues like this shouldn't be reintroduced;
>>> we've been fighting hard to get rid of those (and we still occasionally
>>> find some no-one had noticed before).
>>
>> You mean latency in processing softirqs?
>>
>> Maybe what we need to do is to make local_events_need_delivery() safe
>> to call at this point by having it return 0 if current is NULL rather
>> than crashing?
> 
> That would have the same effect - no softirq processing, and hence
> possible time issues on huge systems.

No, local_events_delivery() only checks to see if the current vcpu has
outstanding virtual interrupts.  The other half of
hypercall_preempt_check() checks for softirqs, which doesn't appear to
rely on having current available.

 -George

> 
> Jan
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 15:28               ` George Dunlap
@ 2016-08-03 15:37                 ` Jan Beulich
  2016-08-03 15:59                   ` George Dunlap
  2016-08-03 16:00                   ` Roger Pau Monne
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2016-08-03 15:37 UTC (permalink / raw)
  To: George Dunlap
  Cc: George Dunlap, Andrew Cooper, TimDeegan, xen-devel, Roger Pau Monne

>>> On 03.08.16 at 17:28, <george.dunlap@citrix.com> wrote:
> On 03/08/16 16:25, Jan Beulich wrote:
>>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
>>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>>>>>>> As this is for the construction of dom0, it would be better to take a
>>>>>>> preemptible pointer, loop in construct_dom0(), with a
>>>>>>> process_pending_softirqs() call in between.
>>>>>>
>>>>>> Now fixed.
>>>>>
>>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>>>>> any of the *_set_allocation function use), is not safe at this point:
>>>>>
>>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>>>>> (XEN) CPU:    0
>>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
>>> hap.c#local_events_need_delivery+0x27/0x40
>>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) Xen code around <ffff82d08022fd47> 
>>> (hap.c#local_events_need_delivery+0x27/0x40):
>>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
>>> c0
>>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
>>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
>>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
>>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
>>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
>>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
>>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
>>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
>>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
>>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
>>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
>>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
>>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
>>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
>>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
>>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
>>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
>>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
>>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
>>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>>>>> (XEN)
>>>>> (XEN) Pagetable walk from 0000000000000000:
>>>>
>>>> Sadly you don't make clear what pointer it is that is NULL at that point.
>>>
>>> It sounds from what he says in the following paragraph like current is NULL.
>> 
>> I don't recall us re-setting current to this late in the boot process.
>> Even during early boot we set it to a bogus non-NULL value rather
>> than NULL.
>> 
>>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
>>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>>>>> added the preempt parameter to the paging_set_allocation function, but I
>>>>> don't plan to use it in the domain builder for the time being. Does that
>>>>> sound right?
>>>>
>>>> Not really, new huge latency issues like this shouldn't be reintroduced;
>>>> we've been fighting hard to get rid of those (and we still occasionally
>>>> find some no-one had noticed before).
>>>
>>> You mean latency in processing softirqs?
>>>
>>> Maybe what we need to do is to make local_events_need_delivery() safe
>>> to call at this point by having it return 0 if current is NULL rather
>>> than crashing?
>> 
>> That would have the same effect - no softirq processing, and hence
>> possible time issues on huge systems.
> 
> No, local_events_delivery() only checks to see if the current vcpu has
> outstanding virtual interrupts.  The other half of
> hypercall_preempt_check() checks for softirqs, which doesn't appear to
> rely on having current available.

Good point, but
- current should nevertheless not be NULL (afaict at least),
- hypercall_preempt_check() is probably the wrong construct,
  as we're no in a hypercall.
The latter of course could be addressed by, as you did suggest,
some refinement to one of the pieces it's being made up from,
but I'm not sure that would be better than perhaps making its
invocation conditional (with some better alternative in the "else"
case) in hap_set_allocation(). Not the least because any
adjustment to hypercall_preempt_check() itself would affect all
other of its users.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 15:37                 ` Jan Beulich
@ 2016-08-03 15:59                   ` George Dunlap
  2016-08-03 16:00                   ` Roger Pau Monne
  1 sibling, 0 replies; 49+ messages in thread
From: George Dunlap @ 2016-08-03 15:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monne, TimDeegan, xen-devel

On Wed, Aug 3, 2016 at 4:37 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 03.08.16 at 17:28, <george.dunlap@citrix.com> wrote:
>> On 03/08/16 16:25, Jan Beulich wrote:
>>>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
>>>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>>>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>>>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>>>>>>>> As this is for the construction of dom0, it would be better to take a
>>>>>>>> preemptible pointer, loop in construct_dom0(), with a
>>>>>>>> process_pending_softirqs() call in between.
>>>>>>>
>>>>>>> Now fixed.
>>>>>>
>>>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>>>>>> any of the *_set_allocation function use), is not safe at this point:
>>>>>>
>>>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>>>>>> (XEN) CPU:    0
>>>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>]
>>>> hap.c#local_events_need_delivery+0x27/0x40
>>>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>>>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>>>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>>>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>>>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>>>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>>>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>>>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>>> (XEN) Xen code around <ffff82d08022fd47>
>>>> (hap.c#local_events_need_delivery+0x27/0x40):
>>>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31
>>>> c0
>>>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>>>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
>>>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
>>>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
>>>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
>>>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
>>>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
>>>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
>>>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
>>>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
>>>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
>>>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
>>>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
>>>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
>>>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
>>>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
>>>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
>>>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
>>>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
>>>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
>>>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>>>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>>>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>>>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>>>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>>>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>>>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>>>>>> (XEN)
>>>>>> (XEN) Pagetable walk from 0000000000000000:
>>>>>
>>>>> Sadly you don't make clear what pointer it is that is NULL at that point.
>>>>
>>>> It sounds from what he says in the following paragraph like current is NULL.
>>>
>>> I don't recall us re-setting current to this late in the boot process.
>>> Even during early boot we set it to a bogus non-NULL value rather
>>> than NULL.
>>>
>>>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
>>>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>>>>>> added the preempt parameter to the paging_set_allocation function, but I
>>>>>> don't plan to use it in the domain builder for the time being. Does that
>>>>>> sound right?
>>>>>
>>>>> Not really, new huge latency issues like this shouldn't be reintroduced;
>>>>> we've been fighting hard to get rid of those (and we still occasionally
>>>>> find some no-one had noticed before).
>>>>
>>>> You mean latency in processing softirqs?
>>>>
>>>> Maybe what we need to do is to make local_events_need_delivery() safe
>>>> to call at this point by having it return 0 if current is NULL rather
>>>> than crashing?
>>>
>>> That would have the same effect - no softirq processing, and hence
>>> possible time issues on huge systems.
>>
>> No, local_events_delivery() only checks to see if the current vcpu has
>> outstanding virtual interrupts.  The other half of
>> hypercall_preempt_check() checks for softirqs, which doesn't appear to
>> rely on having current available.
>
> Good point, but
> - current should nevertheless not be NULL (afaict at least),

Well then it's likely to be one of the vcpu fields that the idle
domain doesn't have.  I'd be willing to bet that v->vcpu_info is NULL
for the idle domain.

> - hypercall_preempt_check() is probably the wrong construct,
>   as we're no in a hypercall.
> The latter of course could be addressed by, as you did suggest,
> some refinement to one of the pieces it's being made up from,
> but I'm not sure that would be better than perhaps making its
> invocation conditional (with some better alternative in the "else"
> case) in hap_set_allocation(). Not the least because any
> adjustment to hypercall_preempt_check() itself would affect all
> other of its users.

Well at the moment any user that calls it with current == NULL (or
perhaps is_idle_vcpu(current)) will crash.  So the only cost will be
an extra check for all the other callers.

The only other option would be to make {hap,sh}_set_allocation() check
to see if is_idle_vcpu(current), and if so just call
softirq_pending(smp_processor_id()) instead.  Or make *another* macro,
boot_softirq_check_or_hypercall_preempt_check(), which will do the
check.  But both seem a bit excessive compared to just modifying
hypercall_preempt_check() to handle being called during boot.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 15:37                 ` Jan Beulich
  2016-08-03 15:59                   ` George Dunlap
@ 2016-08-03 16:00                   ` Roger Pau Monne
  2016-08-03 16:15                     ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-03 16:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, TimDeegan, George Dunlap, xen-devel

On Wed, Aug 03, 2016 at 09:37:41AM -0600, Jan Beulich wrote:
> >>> On 03.08.16 at 17:28, <george.dunlap@citrix.com> wrote:
> > On 03/08/16 16:25, Jan Beulich wrote:
> >>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
> >>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
> >>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
> >>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
> >>>>>>> As this is for the construction of dom0, it would be better to take a
> >>>>>>> preemptible pointer, loop in construct_dom0(), with a
> >>>>>>> process_pending_softirqs() call in between.
> >>>>>>
> >>>>>> Now fixed.
> >>>>>
> >>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
> >>>>> any of the *_set_allocation function use), is not safe at this point:
> >>>>>
> >>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
> >>>>> (XEN) CPU:    0
> >>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
> >>> hap.c#local_events_need_delivery+0x27/0x40
> >>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> >>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
> >>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
> >>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
> >>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
> >>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
> >>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
> >>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
> >>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>>>> (XEN) Xen code around <ffff82d08022fd47> 
> >>> (hap.c#local_events_need_delivery+0x27/0x40):
> >>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
> >>> c0
> >>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
> >>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 ffff83023f5a5000
> >>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 0000000000002400
> >>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd ffff8300b1efd000
> >>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 ffff82d0802cad30
> >>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 0000000000000000
> >>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 ffff82d0802c91e0
> >>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 0000000000000000
> >>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 ffff82d08028b1cd
> >>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 ffff83023f5a5000
> >>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 0000000000000000
> >>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 0000000000000001
> >>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 0000000000100000
> >>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 0000000000100000
> >>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 0000000100000000
> >>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 ffff83000008bfb0
> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 0000000800000000
> >>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 0000000000000000
> >>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 0000000000000008
> >>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 0000000000000000
> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> >>>>> (XEN) Xen call trace:
> >>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
> >>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
> >>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
> >>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
> >>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
> >>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
> >>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
> >>>>> (XEN)
> >>>>> (XEN) Pagetable walk from 0000000000000000:
> >>>>
> >>>> Sadly you don't make clear what pointer it is that is NULL at that point.
> >>>
> >>> It sounds from what he says in the following paragraph like current is NULL.
> >> 
> >> I don't recall us re-setting current to this late in the boot process.
> >> Even during early boot we set it to a bogus non-NULL value rather
> >> than NULL.
> >> 
> >>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
> >>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
> >>>>> added the preempt parameter to the paging_set_allocation function, but I
> >>>>> don't plan to use it in the domain builder for the time being. Does that
> >>>>> sound right?
> >>>>
> >>>> Not really, new huge latency issues like this shouldn't be reintroduced;
> >>>> we've been fighting hard to get rid of those (and we still occasionally
> >>>> find some no-one had noticed before).
> >>>
> >>> You mean latency in processing softirqs?
> >>>
> >>> Maybe what we need to do is to make local_events_need_delivery() safe
> >>> to call at this point by having it return 0 if current is NULL rather
> >>> than crashing?
> >> 
> >> That would have the same effect - no softirq processing, and hence
> >> possible time issues on huge systems.
> > 
> > No, local_events_delivery() only checks to see if the current vcpu has
> > outstanding virtual interrupts.  The other half of
> > hypercall_preempt_check() checks for softirqs, which doesn't appear to
> > rely on having current available.
> 
> Good point, but
> - current should nevertheless not be NULL (afaict at least),
> - hypercall_preempt_check() is probably the wrong construct,
>   as we're no in a hypercall.
> The latter of course could be addressed by, as you did suggest,
> some refinement to one of the pieces it's being made up from,
> but I'm not sure that would be better than perhaps making its
> invocation conditional (with some better alternative in the "else"
> case) in hap_set_allocation(). Not the least because any
> adjustment to hypercall_preempt_check() itself would affect all
> other of its users.

I had added the following patch to my queue in order to fix this:

---
xen/x86: allow calling hypercall_preempt_check with the idle domain

This allows using hypercall_preempt_check while building Dom0.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
index a82062e..d55a8bd 100644
--- a/xen/include/asm-x86/event.h
+++ b/xen/include/asm-x86/event.h
@@ -23,6 +23,10 @@ int hvm_local_events_need_delivery(struct vcpu *v);
 static inline int local_events_need_delivery(void)
 {
     struct vcpu *v = current;
+
+    if ( is_idle_vcpu(v) )
+        return 0;
+
     return (has_hvm_container_vcpu(v) ? hvm_local_events_need_delivery(v) :
             (vcpu_info(v, evtchn_upcall_pending) &&
              !vcpu_info(v, evtchn_upcall_mask)));

---

But seeing your comments I now wonder whether that's appropriate. Is there 
anyway in Xen to know whether Xen is in hypercall context or not?

Another way to fix this would be to change both {hap/sh}_set_allocation 
functions to only call hypercall_check_preempt if current != idle_domain, 
and in the idle domain case just call softirq_pending (which is the same 
that the above change achieves).

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 16:00                   ` Roger Pau Monne
@ 2016-08-03 16:15                     ` Jan Beulich
  2016-08-03 16:24                       ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-08-03 16:15 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, TimDeegan, George Dunlap, xen-devel

>>> On 03.08.16 at 18:00, <roger.pau@citrix.com> wrote:
> On Wed, Aug 03, 2016 at 09:37:41AM -0600, Jan Beulich wrote:
>> >>> On 03.08.16 at 17:28, <george.dunlap@citrix.com> wrote:
>> > On 03/08/16 16:25, Jan Beulich wrote:
>> >>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
>> >>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
>> >>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
>> >>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
>> >>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
>> >>>>>>> As this is for the construction of dom0, it would be better to take a
>> >>>>>>> preemptible pointer, loop in construct_dom0(), with a
>> >>>>>>> process_pending_softirqs() call in between.
>> >>>>>>
>> >>>>>> Now fixed.
>> >>>>>
>> >>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
>> >>>>> any of the *_set_allocation function use), is not safe at this point:
>> >>>>>
>> >>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
>> >>>>> (XEN) CPU:    0
>> >>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
>> >>> hap.c#local_events_need_delivery+0x27/0x40
>> >>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
>> >>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
>> >>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
>> >>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
>> >>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
>> >>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
>> >>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
>> >>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
>> >>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> >>>>> (XEN) Xen code around <ffff82d08022fd47> 
>> >>> (hap.c#local_events_need_delivery+0x27/0x40):
>> >>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
> 
>> >>> c0
>> >>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
>> >>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 
> ffff83023f5a5000
>> >>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 
> 0000000000002400
>> >>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd 
> ffff8300b1efd000
>> >>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 
> ffff82d0802cad30
>> >>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 
> 0000000000000000
>> >>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 
> ffff82d0802c91e0
>> >>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 
> 0000000000000000
>> >>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 
> ffff82d08028b1cd
>> >>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 
> ffff83023f5a5000
>> >>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 
> 0000000000000000
>> >>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 
> 0000000000000001
>> >>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 
> 0000000000100000
>> >>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 
> 0000000000100000
>> >>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 
> 0000000100000000
>> >>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 
> ffff83000008bfb0
>> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 
> 0000000800000000
>> >>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 
> 0000000000000000
>> >>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 
> 0000000000000008
>> >>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 
> 0000000000000000
>> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000
>> >>>>> (XEN) Xen call trace:
>> >>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
>> >>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
>> >>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
>> >>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
>> >>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
>> >>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
>> >>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
>> >>>>> (XEN)
>> >>>>> (XEN) Pagetable walk from 0000000000000000:
>> >>>>
>> >>>> Sadly you don't make clear what pointer it is that is NULL at that point.
>> >>>
>> >>> It sounds from what he says in the following paragraph like current is 
> NULL.
>> >> 
>> >> I don't recall us re-setting current to this late in the boot process.
>> >> Even during early boot we set it to a bogus non-NULL value rather
>> >> than NULL.
>> >> 
>> >>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
>> >>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
>> >>>>> added the preempt parameter to the paging_set_allocation function, but I
>> >>>>> don't plan to use it in the domain builder for the time being. Does that
>> >>>>> sound right?
>> >>>>
>> >>>> Not really, new huge latency issues like this shouldn't be reintroduced;
>> >>>> we've been fighting hard to get rid of those (and we still occasionally
>> >>>> find some no-one had noticed before).
>> >>>
>> >>> You mean latency in processing softirqs?
>> >>>
>> >>> Maybe what we need to do is to make local_events_need_delivery() safe
>> >>> to call at this point by having it return 0 if current is NULL rather
>> >>> than crashing?
>> >> 
>> >> That would have the same effect - no softirq processing, and hence
>> >> possible time issues on huge systems.
>> > 
>> > No, local_events_delivery() only checks to see if the current vcpu has
>> > outstanding virtual interrupts.  The other half of
>> > hypercall_preempt_check() checks for softirqs, which doesn't appear to
>> > rely on having current available.
>> 
>> Good point, but
>> - current should nevertheless not be NULL (afaict at least),
>> - hypercall_preempt_check() is probably the wrong construct,
>>   as we're no in a hypercall.
>> The latter of course could be addressed by, as you did suggest,
>> some refinement to one of the pieces it's being made up from,
>> but I'm not sure that would be better than perhaps making its
>> invocation conditional (with some better alternative in the "else"
>> case) in hap_set_allocation(). Not the least because any
>> adjustment to hypercall_preempt_check() itself would affect all
>> other of its users.
> 
> I had added the following patch to my queue in order to fix this:
> 
> ---
> xen/x86: allow calling hypercall_preempt_check with the idle domain
> 
> This allows using hypercall_preempt_check while building Dom0.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
> index a82062e..d55a8bd 100644
> --- a/xen/include/asm-x86/event.h
> +++ b/xen/include/asm-x86/event.h
> @@ -23,6 +23,10 @@ int hvm_local_events_need_delivery(struct vcpu *v);
>  static inline int local_events_need_delivery(void)
>  {
>      struct vcpu *v = current;
> +
> +    if ( is_idle_vcpu(v) )
> +        return 0;

As said, I think it would be better to not add it here, unless there
is a significant amount of other calls into here from idle vCPU-s
with your changes.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 16:15                     ` Jan Beulich
@ 2016-08-03 16:24                       ` Roger Pau Monne
  2016-08-04  6:19                         ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-03 16:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Andrew Cooper, TimDeegan, George Dunlap, xen-devel

On Wed, Aug 03, 2016 at 10:15:51AM -0600, Jan Beulich wrote:
> >>> On 03.08.16 at 18:00, <roger.pau@citrix.com> wrote:
> > On Wed, Aug 03, 2016 at 09:37:41AM -0600, Jan Beulich wrote:
> >> >>> On 03.08.16 at 17:28, <george.dunlap@citrix.com> wrote:
> >> > On 03/08/16 16:25, Jan Beulich wrote:
> >> >>>>> On 03.08.16 at 17:11, <George.Dunlap@eu.citrix.com> wrote:
> >> >>> On Tue, Aug 2, 2016 at 5:12 PM, Jan Beulich <JBeulich@suse.com> wrote:
> >> >>>>>>> On 02.08.16 at 17:49, <roger.pau@citrix.com> wrote:
> >> >>>>> On Tue, Aug 02, 2016 at 11:47:22AM +0200, Roger Pau Monne wrote:
> >> >>>>>> On Fri, Jul 29, 2016 at 05:47:24PM +0100, Andrew Cooper wrote:
> >> >>>>>>> As this is for the construction of dom0, it would be better to take a
> >> >>>>>>> preemptible pointer, loop in construct_dom0(), with a
> >> >>>>>>> process_pending_softirqs() call in between.
> >> >>>>>>
> >> >>>>>> Now fixed.
> >> >>>>>
> >> >>>>> Hm, I have to stand corrected, using hypercall_preempt_check (as
> >> >>>>> any of the *_set_allocation function use), is not safe at this point:
> >> >>>>>
> >> >>>>> (XEN) ----[ Xen-4.8-unstable  x86_64  debug=y  Tainted:    C  ]----
> >> >>>>> (XEN) CPU:    0
> >> >>>>> (XEN) RIP:    e008:[<ffff82d08022fd47>] 
> >> >>> hap.c#local_events_need_delivery+0x27/0x40
> >> >>>>> (XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor
> >> >>>>> (XEN) rax: 0000000000000000   rbx: ffff83023f5a5000   rcx: ffff82d080312900
> >> >>>>> (XEN) rdx: 0000000000000001   rsi: ffff83023f5a56c8   rdi: ffff8300b213d000
> >> >>>>> (XEN) rbp: ffff82d080307cc8   rsp: ffff82d080307cc8   r8:  0180000000000000
> >> >>>>> (XEN) r9:  0000000000000000   r10: 0000000000247000   r11: ffff82d08029a5b0
> >> >>>>> (XEN) r12: 0000000000000011   r13: 00000000000023ac   r14: ffff82d080307d4c
> >> >>>>> (XEN) r15: ffff83023f5a56c8   cr0: 000000008005003b   cr4: 00000000001526e0
> >> >>>>> (XEN) cr3: 00000000b20fc000   cr2: 0000000000000000
> >> >>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >> >>>>> (XEN) Xen code around <ffff82d08022fd47> 
> >> >>> (hap.c#local_events_need_delivery+0x27/0x40):
> >> >>>>> (XEN)  0d ad fa ff 48 8b 47 08 <80> 38 00 74 09 80 78 01 00 0f 94 c0 eb 02 31 
> > 
> >> >>> c0
> >> >>>>> (XEN) Xen stack trace from rsp=ffff82d080307cc8:
> >> >>>>> (XEN)    ffff82d080307d08 ffff82d08022fc47 0000000000000000 
> > ffff83023f5a5000
> >> >>>>> (XEN)    ffff83023f5a5648 0000000000000000 ffff82d080307d4c 
> > 0000000000002400
> >> >>>>> (XEN)    ffff82d080307d38 ffff82d08020008c 00000000000ffffd 
> > ffff8300b1efd000
> >> >>>>> (XEN)    ffff83023f5a5000 ffff82d080307d4c ffff82d080307d78 
> > ffff82d0802cad30
> >> >>>>> (XEN)    0000000000203000 ffff83023f5a5000 ffff82d0802bf860 
> > 0000000000000000
> >> >>>>> (XEN)    0000000000000001 ffff83000008bef0 ffff82d080307de8 
> > ffff82d0802c91e0
> >> >>>>> (XEN)    ffff82d080307de8 ffff82d080143900 ffff82d080307de8 
> > 0000000000000000
> >> >>>>> (XEN)    ffff83000008bf00 ffff82d0802eb480 ffff82d080307dc4 
> > ffff82d08028b1cd
> >> >>>>> (XEN)    ffff83000008bf00 0000000000000000 0000000000000001 
> > ffff83023f5a5000
> >> >>>>> (XEN)    ffff82d080307f08 ffff82d0802bf0c9 0000000000000000 
> > 0000000000000000
> >> >>>>> (XEN)    0000000000000000 ffff82d080307f18 ffff83000008bee0 
> > 0000000000000001
> >> >>>>> (XEN)    0000000000000001 0000000000000001 0000000000000000 
> > 0000000000100000
> >> >>>>> (XEN)    0000000000000001 0000000000247000 ffff83000008bef4 
> > 0000000000100000
> >> >>>>> (XEN)    ffff830100000000 0000000000247001 0000000000000014 
> > 0000000100000000
> >> >>>>> (XEN)    ffff8300ffffffec ffff83000008bef0 ffff82d0802e0640 
> > ffff83000008bfb0
> >> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000111 
> > 0000000800000000
> >> >>>>> (XEN)    000000010000006e 0000000000000003 00000000000002f8 
> > 0000000000000000
> >> >>>>> (XEN)    00000000ad5c0bd0 0000000000000000 0000000000000001 
> > 0000000000000008
> >> >>>>> (XEN)    0000000000000000 ffff82d080100073 0000000000000000 
> > 0000000000000000
> >> >>>>> (XEN)    0000000000000000 0000000000000000 0000000000000000 
> > 0000000000000000
> >> >>>>> (XEN) Xen call trace:
> >> >>>>> (XEN)    [<ffff82d08022fd47>] hap.c#local_events_need_delivery+0x27/0x40
> >> >>>>> (XEN)    [<ffff82d08022fc47>] hap_set_allocation+0x107/0x130
> >> >>>>> (XEN)    [<ffff82d08020008c>] paging_set_allocation+0x4c/0x80
> >> >>>>> (XEN)    [<ffff82d0802cad30>] domain_build.c#hvm_setup_p2m+0x70/0x1a0
> >> >>>>> (XEN)    [<ffff82d0802c91e0>] domain_build.c#construct_dom0_hvm+0x60/0x120
> >> >>>>> (XEN)    [<ffff82d0802bf0c9>] __start_xen+0x1ea9/0x23a0
> >> >>>>> (XEN)    [<ffff82d080100073>] __high_start+0x53/0x60
> >> >>>>> (XEN)
> >> >>>>> (XEN) Pagetable walk from 0000000000000000:
> >> >>>>
> >> >>>> Sadly you don't make clear what pointer it is that is NULL at that point.
> >> >>>
> >> >>> It sounds from what he says in the following paragraph like current is 
> > NULL.
> >> >> 
> >> >> I don't recall us re-setting current to this late in the boot process.
> >> >> Even during early boot we set it to a bogus non-NULL value rather
> >> >> than NULL.
> >> >> 
> >> >>>>> I've tried setting current to d->vcpu[0], but that just makes the call to
> >> >>>>> hypercall_preempt_check crash in some scheduler assert. In any case, I've
> >> >>>>> added the preempt parameter to the paging_set_allocation function, but I
> >> >>>>> don't plan to use it in the domain builder for the time being. Does that
> >> >>>>> sound right?
> >> >>>>
> >> >>>> Not really, new huge latency issues like this shouldn't be reintroduced;
> >> >>>> we've been fighting hard to get rid of those (and we still occasionally
> >> >>>> find some no-one had noticed before).
> >> >>>
> >> >>> You mean latency in processing softirqs?
> >> >>>
> >> >>> Maybe what we need to do is to make local_events_need_delivery() safe
> >> >>> to call at this point by having it return 0 if current is NULL rather
> >> >>> than crashing?
> >> >> 
> >> >> That would have the same effect - no softirq processing, and hence
> >> >> possible time issues on huge systems.
> >> > 
> >> > No, local_events_delivery() only checks to see if the current vcpu has
> >> > outstanding virtual interrupts.  The other half of
> >> > hypercall_preempt_check() checks for softirqs, which doesn't appear to
> >> > rely on having current available.
> >> 
> >> Good point, but
> >> - current should nevertheless not be NULL (afaict at least),
> >> - hypercall_preempt_check() is probably the wrong construct,
> >>   as we're no in a hypercall.
> >> The latter of course could be addressed by, as you did suggest,
> >> some refinement to one of the pieces it's being made up from,
> >> but I'm not sure that would be better than perhaps making its
> >> invocation conditional (with some better alternative in the "else"
> >> case) in hap_set_allocation(). Not the least because any
> >> adjustment to hypercall_preempt_check() itself would affect all
> >> other of its users.
> > 
> > I had added the following patch to my queue in order to fix this:
> > 
> > ---
> > xen/x86: allow calling hypercall_preempt_check with the idle domain
> > 
> > This allows using hypercall_preempt_check while building Dom0.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > 
> > diff --git a/xen/include/asm-x86/event.h b/xen/include/asm-x86/event.h
> > index a82062e..d55a8bd 100644
> > --- a/xen/include/asm-x86/event.h
> > +++ b/xen/include/asm-x86/event.h
> > @@ -23,6 +23,10 @@ int hvm_local_events_need_delivery(struct vcpu *v);
> >  static inline int local_events_need_delivery(void)
> >  {
> >      struct vcpu *v = current;
> > +
> > +    if ( is_idle_vcpu(v) )
> > +        return 0;
> 
> As said, I think it would be better to not add it here, unless there
> is a significant amount of other calls into here from idle vCPU-s
> with your changes.

No, the only functions that I use that call hypercall_preempt_check are the 
_set_allocation ones. I would like to add an ASSERT here to make sure 
local_events_need_delivery is not called with current == idle_vcpu.

In any case, I would go with route 2 and modify _set_allocation to call 
softirq_pending instead of hypercall_preempt_check if current == idle_vcpu. 
This should solve the issue and doesn't involve changing 
hypercall_preempt_check itself.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation
  2016-08-03 16:24                       ` Roger Pau Monne
@ 2016-08-04  6:19                         ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-08-04  6:19 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, TimDeegan, George Dunlap, xen-devel

>>> On 03.08.16 at 18:24, <roger.pau@citrix.com> wrote:
> On Wed, Aug 03, 2016 at 10:15:51AM -0600, Jan Beulich wrote:
>> >>> On 03.08.16 at 18:00, <roger.pau@citrix.com> wrote:
>> > --- a/xen/include/asm-x86/event.h
>> > +++ b/xen/include/asm-x86/event.h
>> > @@ -23,6 +23,10 @@ int hvm_local_events_need_delivery(struct vcpu *v);
>> >  static inline int local_events_need_delivery(void)
>> >  {
>> >      struct vcpu *v = current;
>> > +
>> > +    if ( is_idle_vcpu(v) )
>> > +        return 0;
>> 
>> As said, I think it would be better to not add it here, unless there
>> is a significant amount of other calls into here from idle vCPU-s
>> with your changes.
> 
> No, the only functions that I use that call hypercall_preempt_check are the 
> _set_allocation ones. I would like to add an ASSERT here to make sure 
> local_events_need_delivery is not called with current == idle_vcpu.

That's fine with me.

> In any case, I would go with route 2 and modify _set_allocation to call 
> softirq_pending instead of hypercall_preempt_check if current == idle_vcpu. 
> This should solve the issue and doesn't involve changing 
> hypercall_preempt_check itself.

Thanks.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2
  2016-08-01 11:36     ` Roger Pau Monne
@ 2016-08-04 18:28       ` Andrew Cooper
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2016-08-04 18:28 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich

On 01/08/16 12:36, Roger Pau Monne wrote:
> On Fri, Jul 29, 2016 at 06:57:08PM +0100, Andrew Cooper wrote:
>> On 29/07/16 17:28, Roger Pau Monne wrote:
>>> Split the Dom0 builder into two different functions, one for PV (and classic
>>> PVH), and another one for PVHv2. Introduce a new command line parameter,
>>> dom0hvm in order to request the creation of a PVHv2 Dom0.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>  xen/arch/x86/domain_build.c | 27 ++++++++++++++++++++++++++-
>>>  xen/arch/x86/setup.c        |  9 +++++++++
>> A patch to docs/misc/xen-command-line.markdown please.
> OK, I wasn't really sure if we want to introduce a new command line option, 
> or just use dom0pvh. In any case I will add it, we can always alias dom0pvh 
> to dom0pvh (or the other way around) when classic PVH support is 
> removed.

I am not terribly fussed, so long as the docs match the hypervisor
behaviour :)

>  
>>>  2 files changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
>>> index 09d79be..c0ef40f 100644
>>> --- a/xen/arch/x86/domain_build.c
>>> +++ b/xen/arch/x86/domain_build.c
>>> @@ -952,7 +952,7 @@ static int __init setup_permissions(struct domain *d)
>>>      return rc;
>>>  }
>>>  
>>> -int __init construct_dom0(
>>> +static int __init construct_dom0_pv(
>>>      struct domain *d,
>>>      const module_t *image, unsigned long image_headroom,
>>>      module_t *initrd,
>>> @@ -1647,6 +1647,31 @@ out:
>>>      return rc;
>>>  }
>>>  
>>> +static int __init construct_dom0_hvm(struct domain *d, const module_t *image,
>>> +                                     unsigned long image_headroom,
>>> +                                     module_t *initrd,
>>> +                                     void *(*bootstrap_map)(const module_t *),
>>> +                                     char *cmdline)
>>> +{
>>> +
>>> +    printk("** Building a PVH Dom0 **\n");
>> Some naming curiosities here, especially given the parameter name.
> Hm, AFAIK we agreed on keeping the 'PVH' naming, but since internally Xen 
> has no concept of 'PVH' I think the constructor is better named as HVM (and 
> in fact if PVH wasn't there before I would just consider this a HVM Dom0).
>
> If people prefer HVM I can certainly change it, but I think it's going to 
> get messy.

Fair enough.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-08-02  9:19     ` Roger Pau Monne
@ 2016-08-04 18:43       ` Andrew Cooper
  2016-08-05  9:40         ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Cooper @ 2016-08-04 18:43 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich

On 02/08/16 10:19, Roger Pau Monne wrote:
> On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote:
>> On 29/07/16 17:29, Roger Pau Monne wrote:
>>> Craft the Dom0 e820 memory map and populate it.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> Cc: Jan Beulich <jbeulich@suse.com>
>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>>  xen/arch/x86/domain_build.c | 199 ++++++++++++++++++++++++++++++++++++++++++--
>>>  1 file changed, 193 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
>>> index c0ef40f..cb8ecbd 100644
>>> --- a/xen/arch/x86/domain_build.c
>>> +++ b/xen/arch/x86/domain_build.c
>>> @@ -43,6 +43,11 @@ static long __initdata dom0_nrpages;
>>>  static long __initdata dom0_min_nrpages;
>>>  static long __initdata dom0_max_nrpages = LONG_MAX;
>>>  
>>> +/* GFN of the identity map for EPT. */
>>> +#define HVM_IDENT_PT_GFN  0xfeffeu
>> The IDENT_PT is only needed on Gen1 VT-x hardware which doesn't support
>> unrestricted-guest mode.  OTOH, it needs to be matched with VM86_TSS,
>> which is also needed if hardware doesn't support unrestricted-guest. 
>> Also, the IDENT_PT can live anywhere in GFN space.  0xfeffe is just
>> convention for HVM guests as nothing else interesting lives there, but a
>> PVH dom0 will want to ensure it doesn't alias anything interesting it
>> might wish to map.
>>
>> Now I look at it, there is substantial WTF.  The domain builder makes
>> IDENT_PT, but HVMLoader makes VM86_TSS.  I presume this is a historical
>> side effect of IDENT_PT being a prerequisite to even executing
>> hvmloader, while VM86_TSS was only a prerequisite to executing the
>> rombios payload.  Either way, eww :(
>>
>> I think the VM86_TSS setup needs to move to unconditionally being beside
>> IDENT_PT setup in the domain builder, and mirrored here in dom0_hvm()
>> creation.  I don't think it is reasonable to expect an HVMLite payload
>> to set up its own VM86_TSS if it didn't set up IDENT_PT.  (OTOH, the
>> chances of an HVMLite guest ever needing to return to real mode is slim,
>> but in the name of flexibility, it would be nice not to rule it out).
>>
>> Longterm, it would be nice to disentangle the unrestricted-guest support
>> from the main vmx code, and make it able to be compiled out.  There is
>> lots of it, and it all-but-dead on modern hardware.
> Thanks! I didn't know anything about the VM86 TSS stuff, the fact that the 
> identity page tables and the VM86 TSS are set at completely different points 
> makes it quite hard to follow :/.
>
> I've now moved the setup of the VM86 TSS inside the domain builder for both 
> DomU and Dom0.

Thanks.  (Guess who has recently had to delve back into this code^W swamp.)

>
>>> +}
>>> +
>>> +/* Calculate the biggest usable order given a size in bytes. */
>>> +static inline unsigned int get_order(uint64_t size)
>>> +{
>>> +    unsigned int order;
>>> +    uint64_t pg;
>>> +
>>> +    ASSERT((size & ~PAGE_MASK) == 0);
>>> +    pg = PFN_DOWN(size);
>>> +    for ( order = 0; pg >= (1 << (order + 1)); order++ );
>>> +
>>> +    return order;
>>> +}
>> We already have get_order_from_bytes() and get_order_from_pages(), the
>> latter of which looks like it will suit your usecase.
> Not really, or at least they don't do the same as get_order. This function 
> calculates the maximum order you can use so that there are no pages left 
> over, (ie: if you have a size of 3145728bytes (3MiB), this function will 
> return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I 
> don't really understand while other places in code request bigger orders and 
> then free the leftovers, isn't this also causing memory shattering?

Sounds like we want something like get_order_{floor,ceil}() which makes
it obvious which way non-power-of-two get rounded.

>
>> As a TODO item, I really would like to borrow some of the Linux
>> infrastructure to calculate orders of constants at compile time, because
>> a large number of callers of the existing get_order_*() functions are
>> performing unnecessary runtime calculation.  For those which need
>> runtime calculation, some careful use of ffs() would be preferable to a
>> loop.
>>
>>> +
>>> +/* Populate an HVM memory range using the biggest possible order. */
>>> +static void __init hvm_populate_memory_range(struct domain *d, uint64_t start,
>>> +                                             uint64_t size)
>>> +{
>>> +    static unsigned int __initdata memflags = MEMF_no_dma|MEMF_exact_node;
>>> +    unsigned int order;
>>> +    struct page_info *page;
>>> +    int rc;
>>> +
>>> +    ASSERT((size & ~PAGE_MASK) == 0 && (start & ~PAGE_MASK) == 0);
>>> +
>>> +    order = MAX_ORDER;
>>> +    while ( size != 0 )
>>> +    {
>>> +        order = min(get_order(size), order);
>>> +        page = alloc_domheap_pages(d, order, memflags);
>>> +        if ( page == NULL )
>>> +        {
>>> +            if ( order == 0 && memflags )
>>> +            {
>>> +                /* Try again without any memflags. */
>>> +                memflags = 0;
>>> +                order = MAX_ORDER;
>>> +                continue;
>>> +            }
>>> +            if ( order == 0 )
>>> +                panic("Unable to allocate memory with order 0!\n");
>>> +            order--;
>>> +            continue;
>>> +        }
>> It would be far more efficient to try and allocate only 1G and 2M
>> blocks.  Most of memory is free at this point, and it would allow the
>> use of HAP superpage mappings, which will be a massive performance boost
>> if they aren't shattered.
> That's what I'm trying to do, but we might have to use pages of lower order 
> when filling the smaller gaps.

As a general principle, we should try not to have any gaps.  This also
extends to guests using more intelligence when deciding now to mutate
its physmap.

>  As an example, this are the stats when 
> building a domain with 6048M of RAM:
>
> (XEN) Memory allocation stats:
> (XEN) Order 18: 5GB
> (XEN) Order 17: 512MB
> (XEN) Order 15: 256MB
> (XEN) Order 14: 128MB
> (XEN) Order 12: 16MB
> (XEN) Order 10: 8MB
> (XEN) Order  9: 4MB
> (XEN) Order  8: 2MB
> (XEN) Order  7: 1MB
> (XEN) Order  6: 512KB
> (XEN) Order  5: 256KB
> (XEN) Order  4: 128KB
> (XEN) Order  3: 64KB
> (XEN) Order  2: 32KB
> (XEN) Order  1: 16KB
> (XEN) Order  0: 4KB
>
> IMHO, they are quite good.

What are the RAM characteristics of the host?  Do you have any idea what
the hap superpage characteristics are like after the guest has booted?

In a case like this, I think it would be entirely reasonable to round up
to the nearest 2MB, and avoid all of those small page mappings.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-08-04 18:43       ` Andrew Cooper
@ 2016-08-05  9:40         ` Roger Pau Monne
  2016-08-11 18:28           ` Andrew Cooper
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-08-05  9:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich

On Thu, Aug 04, 2016 at 07:43:39PM +0100, Andrew Cooper wrote:
> On 02/08/16 10:19, Roger Pau Monne wrote:
> > On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote:
> >> On 29/07/16 17:29, Roger Pau Monne wrote:
> >>> +/* Calculate the biggest usable order given a size in bytes. */
> >>> +static inline unsigned int get_order(uint64_t size)
> >>> +{
> >>> +    unsigned int order;
> >>> +    uint64_t pg;
> >>> +
> >>> +    ASSERT((size & ~PAGE_MASK) == 0);
> >>> +    pg = PFN_DOWN(size);
> >>> +    for ( order = 0; pg >= (1 << (order + 1)); order++ );
> >>> +
> >>> +    return order;
> >>> +}
> >> We already have get_order_from_bytes() and get_order_from_pages(), the
> >> latter of which looks like it will suit your usecase.
> > Not really, or at least they don't do the same as get_order. This function 
> > calculates the maximum order you can use so that there are no pages left 
> > over, (ie: if you have a size of 3145728bytes (3MiB), this function will 
> > return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I 
> > don't really understand while other places in code request bigger orders and 
> > then free the leftovers, isn't this also causing memory shattering?
> 
> Sounds like we want something like get_order_{floor,ceil}() which makes
> it obvious which way non-power-of-two get rounded.

Right, that makes sense, will rename the current one to ceil, and add the 
floor variant.

> >>> +            if ( order == 0 && memflags )
> >>> +            {
> >>> +                /* Try again without any memflags. */
> >>> +                memflags = 0;
> >>> +                order = MAX_ORDER;
> >>> +                continue;
> >>> +            }
> >>> +            if ( order == 0 )
> >>> +                panic("Unable to allocate memory with order 0!\n");
> >>> +            order--;
> >>> +            continue;
> >>> +        }
> >> It would be far more efficient to try and allocate only 1G and 2M
> >> blocks.  Most of memory is free at this point, and it would allow the
> >> use of HAP superpage mappings, which will be a massive performance boost
> >> if they aren't shattered.
> > That's what I'm trying to do, but we might have to use pages of lower order 
> > when filling the smaller gaps.
> 
> As a general principle, we should try not to have any gaps.  This also
> extends to guests using more intelligence when deciding now to mutate
> its physmap.

Yes, but in this case we are limited by the original e820 from the host.
A DomU (without passthrough) will have all it's memory contiguously.
 
> >  As an example, this are the stats when 
> > building a domain with 6048M of RAM:
> >
> > (XEN) Memory allocation stats:
> > (XEN) Order 18: 5GB
> > (XEN) Order 17: 512MB
> > (XEN) Order 15: 256MB
> > (XEN) Order 14: 128MB
> > (XEN) Order 12: 16MB
> > (XEN) Order 10: 8MB
> > (XEN) Order  9: 4MB
> > (XEN) Order  8: 2MB
> > (XEN) Order  7: 1MB
> > (XEN) Order  6: 512KB
> > (XEN) Order  5: 256KB
> > (XEN) Order  4: 128KB
> > (XEN) Order  3: 64KB
> > (XEN) Order  2: 32KB
> > (XEN) Order  1: 16KB
> > (XEN) Order  0: 4KB
> >
> > IMHO, they are quite good.
> 
> What are the RAM characteristics of the host?  Do you have any idea what
> the hap superpage characteristics are like after the guest has booted?

This is the host RAM map:

(XEN)  0000000000000000 - 000000000009c800 (usable)
(XEN)  000000000009c800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000ad662000 (usable)
(XEN)  00000000ad662000 - 00000000adb1f000 (reserved)
(XEN)  00000000adb1f000 - 00000000b228b000 (usable)
(XEN)  00000000b228b000 - 00000000b2345000 (reserved)
(XEN)  00000000b2345000 - 00000000b236a000 (ACPI data)
(XEN)  00000000b236a000 - 00000000b2c9a000 (ACPI NVS)
(XEN)  00000000b2c9a000 - 00000000b2fff000 (reserved)
(XEN)  00000000b2fff000 - 00000000b3000000 (usable)
(XEN)  00000000b3800000 - 00000000b8000000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed04000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff000000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000247000000 (usable)

No idea about the HAP superpage characteristics, how can I fetch this 
information? (I know I can dump the guest EPT tables, but that just 
saturates the console).

> In a case like this, I think it would be entirely reasonable to round up
> to the nearest 2MB, and avoid all of those small page mappings.

Hm, but then we would be expanding the RAM region and we should either 
modify the guest e820 to reflect that (which I think it's a bad idea, given 
that we might be shadowing MMIO regions), or simply use a 2MB page to cover 
a 4KB hole, in which case we are throwing away memory and the computation of 
the required memory will be off.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map
  2016-08-05  9:40         ` Roger Pau Monne
@ 2016-08-11 18:28           ` Andrew Cooper
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Cooper @ 2016-08-11 18:28 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich

On 05/08/16 10:40, Roger Pau Monne wrote:
> On Thu, Aug 04, 2016 at 07:43:39PM +0100, Andrew Cooper wrote:
>> On 02/08/16 10:19, Roger Pau Monne wrote:
>>> On Fri, Jul 29, 2016 at 08:04:12PM +0100, Andrew Cooper wrote:
>>>> On 29/07/16 17:29, Roger Pau Monne wrote:
>>>>> +/* Calculate the biggest usable order given a size in bytes. */
>>>>> +static inline unsigned int get_order(uint64_t size)
>>>>> +{
>>>>> +    unsigned int order;
>>>>> +    uint64_t pg;
>>>>> +
>>>>> +    ASSERT((size & ~PAGE_MASK) == 0);
>>>>> +    pg = PFN_DOWN(size);
>>>>> +    for ( order = 0; pg >= (1 << (order + 1)); order++ );
>>>>> +
>>>>> +    return order;
>>>>> +}
>>>> We already have get_order_from_bytes() and get_order_from_pages(), the
>>>> latter of which looks like it will suit your usecase.
>>> Not really, or at least they don't do the same as get_order. This function 
>>> calculates the maximum order you can use so that there are no pages left 
>>> over, (ie: if you have a size of 3145728bytes (3MiB), this function will 
>>> return order 9 (2MiB), while the other ones will return order 10 (4MiB)). I 
>>> don't really understand while other places in code request bigger orders and 
>>> then free the leftovers, isn't this also causing memory shattering?
>> Sounds like we want something like get_order_{floor,ceil}() which makes
>> it obvious which way non-power-of-two get rounded.
> Right, that makes sense, will rename the current one to ceil, and add the 
> floor variant.
>
>>>>> +            if ( order == 0 && memflags )
>>>>> +            {
>>>>> +                /* Try again without any memflags. */
>>>>> +                memflags = 0;
>>>>> +                order = MAX_ORDER;
>>>>> +                continue;
>>>>> +            }
>>>>> +            if ( order == 0 )
>>>>> +                panic("Unable to allocate memory with order 0!\n");
>>>>> +            order--;
>>>>> +            continue;
>>>>> +        }
>>>> It would be far more efficient to try and allocate only 1G and 2M
>>>> blocks.  Most of memory is free at this point, and it would allow the
>>>> use of HAP superpage mappings, which will be a massive performance boost
>>>> if they aren't shattered.
>>> That's what I'm trying to do, but we might have to use pages of lower order 
>>> when filling the smaller gaps.
>> As a general principle, we should try not to have any gaps.  This also
>> extends to guests using more intelligence when deciding now to mutate
>> its physmap.
> Yes, but in this case we are limited by the original e820 from the host.
> A DomU (without passthrough) will have all it's memory contiguously.

Ah yes - that is a legitimate restriction.

>  
>>>  As an example, this are the stats when 
>>> building a domain with 6048M of RAM:
>>>
>>> (XEN) Memory allocation stats:
>>> (XEN) Order 18: 5GB
>>> (XEN) Order 17: 512MB
>>> (XEN) Order 15: 256MB
>>> (XEN) Order 14: 128MB
>>> (XEN) Order 12: 16MB
>>> (XEN) Order 10: 8MB
>>> (XEN) Order  9: 4MB
>>> (XEN) Order  8: 2MB
>>> (XEN) Order  7: 1MB
>>> (XEN) Order  6: 512KB
>>> (XEN) Order  5: 256KB
>>> (XEN) Order  4: 128KB
>>> (XEN) Order  3: 64KB
>>> (XEN) Order  2: 32KB
>>> (XEN) Order  1: 16KB
>>> (XEN) Order  0: 4KB
>>>
>>> IMHO, they are quite good.
>> What are the RAM characteristics of the host?  Do you have any idea what
>> the hap superpage characteristics are like after the guest has booted?
> This is the host RAM map:
>
> (XEN)  0000000000000000 - 000000000009c800 (usable)
> (XEN)  000000000009c800 - 00000000000a0000 (reserved)
> (XEN)  00000000000e0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 00000000ad662000 (usable)
> (XEN)  00000000ad662000 - 00000000adb1f000 (reserved)
> (XEN)  00000000adb1f000 - 00000000b228b000 (usable)
> (XEN)  00000000b228b000 - 00000000b2345000 (reserved)
> (XEN)  00000000b2345000 - 00000000b236a000 (ACPI data)
> (XEN)  00000000b236a000 - 00000000b2c9a000 (ACPI NVS)
> (XEN)  00000000b2c9a000 - 00000000b2fff000 (reserved)
> (XEN)  00000000b2fff000 - 00000000b3000000 (usable)
> (XEN)  00000000b3800000 - 00000000b8000000 (reserved)
> (XEN)  00000000f8000000 - 00000000fc000000 (reserved)
> (XEN)  00000000fec00000 - 00000000fec01000 (reserved)
> (XEN)  00000000fed00000 - 00000000fed04000 (reserved)
> (XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee01000 (reserved)
> (XEN)  00000000ff000000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 0000000247000000 (usable)
>
> No idea about the HAP superpage characteristics, how can I fetch this 
> information? (I know I can dump the guest EPT tables, but that just 
> saturates the console).

Not easily, and also not the first time I have run into this problem. 
We really should have at least a debug way of identifying this.

For now, you can count how many time ept_split_super_page() get called
at each level.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2
  2016-07-29 16:29 ` [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
@ 2016-09-26 16:16   ` Jan Beulich
  2016-09-26 17:11     ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-09-26 16:16 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
> Introduce a helper to parse the Dom0 kernel.

It would help if you clarified why this needs to be different from
the PV ELF parsing - to me, ELF is ELF, so no duplication should
be necessary from a very high level perspective.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs
  2016-07-29 16:29 ` [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs Roger Pau Monne
@ 2016-09-26 16:19   ` Jan Beulich
  2016-09-26 17:05     ` Roger Pau Monne
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-09-26 16:19 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
> Initialize Dom0 BSP/APs and setup the memory and IO permissions.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> The logic used to setup the CPUID leaves is extremely simplistic (and
> probably wrong for hardware different than mine). I'm not sure what's the
> best way to deal with this, the code that currently sets the CPUID leaves
> for HVM guests lives in libxc, maybe moving it xen/common would be better?

Why don't you just set them from their respective hardware values?

> +static int __init hvm_setup_cpus(struct domain *d, paddr_t entry,
> +                                 paddr_t start_info)
> +{
> +    vcpu_hvm_context_t cpu_ctx;
> +    struct vcpu *v = d->vcpu[0];
> +    int cpu, i, rc;
> +    struct {
> +        uint32_t index;
> +        uint32_t count;
> +    } cpuid_leaves[] = {
> +        {0, XEN_CPUID_INPUT_UNUSED},
> +        {1, XEN_CPUID_INPUT_UNUSED},
> +        {2, XEN_CPUID_INPUT_UNUSED},
> +        {4, 0},
> +        {4, 1},
> +        {4, 2},
> +        {4, 3},
> +        {4, 4},
> +        {7, 0},
> +        {0xa, XEN_CPUID_INPUT_UNUSED},
> +        {0xd, 0},
> +        {0x80000000, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000001, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000002, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000003, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000004, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000005, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000006, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000007, XEN_CPUID_INPUT_UNUSED},
> +        {0x80000008, XEN_CPUID_INPUT_UNUSED},
> +    };
> +
> +    printk("** Setting up BSP/APs **\n");
> +
> +    cpu = v->processor;
> +    for ( i = 1; i < d->max_vcpus; i++ )
> +    {
> +        cpu = cpumask_cycle(cpu, &dom0_cpus);
> +        setup_dom0_vcpu(d, i, cpu);
> +    }
> +
> +    memset(&cpu_ctx, 0, sizeof(cpu_ctx));
> +
> +    cpu_ctx.mode = VCPU_HVM_MODE_32B;
> +
> +    cpu_ctx.cpu_regs.x86_32.ebx = start_info;
> +    cpu_ctx.cpu_regs.x86_32.eip = entry;
> +    cpu_ctx.cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET;
> +
> +    cpu_ctx.cpu_regs.x86_32.cs_limit = ~0u;
> +    cpu_ctx.cpu_regs.x86_32.ds_limit = ~0u;
> +    cpu_ctx.cpu_regs.x86_32.ss_limit = ~0u;
> +    cpu_ctx.cpu_regs.x86_32.tr_limit = 0x67;
> +    cpu_ctx.cpu_regs.x86_32.cs_ar = 0xc9b;
> +    cpu_ctx.cpu_regs.x86_32.ds_ar = 0xc93;
> +    cpu_ctx.cpu_regs.x86_32.ss_ar = 0xc93;
> +    cpu_ctx.cpu_regs.x86_32.tr_ar = 0x8b;

Much of this should be power on state of a HVM vCPU anyway, so
it's not immediately clear why all of this is needed.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables
  2016-07-29 16:29 ` [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
@ 2016-09-26 16:21   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-09-26 16:21 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
> This maps all the regions in the e820 marked as E820_ACPI or E820_NVS to
> Dom0 1:1. It also shadows the page(s) where the native MADT is placed by
> mapping a RAM page over it, copying the original data and modifying it
> afterwards in order to represent the real CPU topology exposed to Dom0.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> FWIW, I think that the current approach that I've used in order to craft the
> MADT is not the best one, IMHO it would be better to place the MADT at the
> end of the E820_ACPI region (expanding it's size one page), and modify the
> XSDT/RSDT in order to point to it, that way we avoid shadowing any other
> ACPI data that might be at the same page as the native MADT (and that needs
> to be modified by Dom0) .

I think I agree. Problem is that you can't be sure there is any space
left after the E820_ACPI region, so I think the placement aspect
would need relaxation.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] PVHv2 Dom0
  2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
                   ` (12 preceding siblings ...)
  2016-07-29 16:38 ` [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
@ 2016-09-26 16:25 ` Jan Beulich
  2016-09-26 17:12   ` Roger Pau Monne
  13 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2016-09-26 16:25 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

>>> On 29.07.16 at 18:28, <roger.pau@citrix.com> wrote:
> This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
> 
> of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
> *should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.

It's been a long time since you had posted this, but I thought I'd
take a coarse look nevertheless. Basic concept looks okay, but of
course there are many rough edges.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs
  2016-09-26 16:19   ` Jan Beulich
@ 2016-09-26 17:05     ` Roger Pau Monne
  2016-09-27  8:10       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Roger Pau Monne @ 2016-09-26 17:05 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Mon, Sep 26, 2016 at 10:19:04AM -0600, Jan Beulich wrote:
> >>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
> > Initialize Dom0 BSP/APs and setup the memory and IO permissions.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > ---
> > The logic used to setup the CPUID leaves is extremely simplistic (and
> > probably wrong for hardware different than mine). I'm not sure what's the
> > best way to deal with this, the code that currently sets the CPUID leaves
> > for HVM guests lives in libxc, maybe moving it xen/common would be better?
> 
> Why don't you just set them from their respective hardware values?

That's what I'm already doing in this patch, but I'm not sure what should be 
removed (at least virt extensions should be removed, but I guess there's 
other stuff that we might want to remove too).
 
> > +static int __init hvm_setup_cpus(struct domain *d, paddr_t entry,
> > +                                 paddr_t start_info)
> > +{
> > +    vcpu_hvm_context_t cpu_ctx;
> > +    struct vcpu *v = d->vcpu[0];
> > +    int cpu, i, rc;
> > +    struct {
> > +        uint32_t index;
> > +        uint32_t count;
> > +    } cpuid_leaves[] = {
> > +        {0, XEN_CPUID_INPUT_UNUSED},
> > +        {1, XEN_CPUID_INPUT_UNUSED},
> > +        {2, XEN_CPUID_INPUT_UNUSED},
> > +        {4, 0},
> > +        {4, 1},
> > +        {4, 2},
> > +        {4, 3},
> > +        {4, 4},
> > +        {7, 0},
> > +        {0xa, XEN_CPUID_INPUT_UNUSED},
> > +        {0xd, 0},
> > +        {0x80000000, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000001, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000002, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000003, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000004, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000005, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000006, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000007, XEN_CPUID_INPUT_UNUSED},
> > +        {0x80000008, XEN_CPUID_INPUT_UNUSED},
> > +    };
> > +
> > +    printk("** Setting up BSP/APs **\n");
> > +
> > +    cpu = v->processor;
> > +    for ( i = 1; i < d->max_vcpus; i++ )
> > +    {
> > +        cpu = cpumask_cycle(cpu, &dom0_cpus);
> > +        setup_dom0_vcpu(d, i, cpu);
> > +    }
> > +
> > +    memset(&cpu_ctx, 0, sizeof(cpu_ctx));
> > +
> > +    cpu_ctx.mode = VCPU_HVM_MODE_32B;
> > +
> > +    cpu_ctx.cpu_regs.x86_32.ebx = start_info;
> > +    cpu_ctx.cpu_regs.x86_32.eip = entry;
> > +    cpu_ctx.cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET;
> > +
> > +    cpu_ctx.cpu_regs.x86_32.cs_limit = ~0u;
> > +    cpu_ctx.cpu_regs.x86_32.ds_limit = ~0u;
> > +    cpu_ctx.cpu_regs.x86_32.ss_limit = ~0u;
> > +    cpu_ctx.cpu_regs.x86_32.tr_limit = 0x67;
> > +    cpu_ctx.cpu_regs.x86_32.cs_ar = 0xc9b;
> > +    cpu_ctx.cpu_regs.x86_32.ds_ar = 0xc93;
> > +    cpu_ctx.cpu_regs.x86_32.ss_ar = 0xc93;
> > +    cpu_ctx.cpu_regs.x86_32.tr_ar = 0x8b;
> 
> Much of this should be power on state of a HVM vCPU anyway, so
> it's not immediately clear why all of this is needed.

Oh, since I couldn't find any reference to the power on state of a HVM 
vCPU I though it might be better to set it all here, so it's clear what's 
the state on power on.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2
  2016-09-26 16:16   ` Jan Beulich
@ 2016-09-26 17:11     ` Roger Pau Monne
  0 siblings, 0 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-09-26 17:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen-devel

On Mon, Sep 26, 2016 at 10:16:19AM -0600, Jan Beulich wrote:
> >>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
> > Introduce a helper to parse the Dom0 kernel.
> 
> It would help if you clarified why this needs to be different from
> the PV ELF parsing - to me, ELF is ELF, so no duplication should
> be necessary from a very high level perspective.

AFAICT, the current elf loader code in the PV Dom0 builder uses virtual 
addresses, and the HVM Dom0 builder only uses physical addresses. I agree 
that it could probably be made to work for both, but it's going to make the 
code more complex, and I really prefer the HVM Dom0 builder to be as clean 
and simple as possible.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] PVHv2 Dom0
  2016-09-26 16:25 ` Jan Beulich
@ 2016-09-26 17:12   ` Roger Pau Monne
  2016-09-26 17:55     ` Konrad Rzeszutek Wilk
  2016-09-27  8:11     ` Jan Beulich
  0 siblings, 2 replies; 49+ messages in thread
From: Roger Pau Monne @ 2016-09-26 17:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Mon, Sep 26, 2016 at 10:25:06AM -0600, Jan Beulich wrote:
> >>> On 29.07.16 at 18:28, <roger.pau@citrix.com> wrote:
> > This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
> > 
> > of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
> > *should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.
> 
> It's been a long time since you had posted this, but I thought I'd
> take a coarse look nevertheless. Basic concept looks okay, but of
> course there are many rough edges.

Yes, sorry, it took longer than expected. I've just rebased all the HVM Dom0 
code today and I will post a new version before the end of the week, this 
time with PCIe, MSI and MSI-X support.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] PVHv2 Dom0
  2016-09-26 17:12   ` Roger Pau Monne
@ 2016-09-26 17:55     ` Konrad Rzeszutek Wilk
  2016-09-27  8:11     ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Konrad Rzeszutek Wilk @ 2016-09-26 17:55 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Jan Beulich

On Mon, Sep 26, 2016 at 07:12:39PM +0200, Roger Pau Monne wrote:
> On Mon, Sep 26, 2016 at 10:25:06AM -0600, Jan Beulich wrote:
> > >>> On 29.07.16 at 18:28, <roger.pau@citrix.com> wrote:
> > > This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
> > > 
> > > of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
> > > *should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.
> > 
> > It's been a long time since you had posted this, but I thought I'd
> > take a coarse look nevertheless. Basic concept looks okay, but of
> > course there are many rough edges.
> 
> Yes, sorry, it took longer than expected. I've just rebased all the HVM Dom0 
> code today and I will post a new version before the end of the week, this 
> time with PCIe, MSI and MSI-X support.

And please CC me and Boris on them.

Thanks.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs
  2016-09-26 17:05     ` Roger Pau Monne
@ 2016-09-27  8:10       ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-09-27  8:10 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, xen-devel

>>> On 26.09.16 at 19:05, <roger.pau@citrix.com> wrote:
> On Mon, Sep 26, 2016 at 10:19:04AM -0600, Jan Beulich wrote:
>> >>> On 29.07.16 at 18:29, <roger.pau@citrix.com> wrote:
>> > Initialize Dom0 BSP/APs and setup the memory and IO permissions.
>> > 
>> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> > ---
>> > Cc: Jan Beulich <jbeulich@suse.com>
>> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> > ---
>> > The logic used to setup the CPUID leaves is extremely simplistic (and
>> > probably wrong for hardware different than mine). I'm not sure what's the
>> > best way to deal with this, the code that currently sets the CPUID leaves
>> > for HVM guests lives in libxc, maybe moving it xen/common would be better?
>> 
>> Why don't you just set them from their respective hardware values?
> 
> That's what I'm already doing in this patch, but I'm not sure what should be 
> removed (at least virt extensions should be removed, but I guess there's 
> other stuff that we might want to remove too).

Well, wouldn't the first step be to match current Dom0 behavior?
The reference what to clear here would then be the HVM feature
set and the current behavior of pv_cpuid() for Dom0.

Virt extensions, since you mention them, are particularly something
I'm not sure need clearing (at least mid to long term), since nested
mode could arguably be usable by Dom0.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH RFC 01/12] PVHv2 Dom0
  2016-09-26 17:12   ` Roger Pau Monne
  2016-09-26 17:55     ` Konrad Rzeszutek Wilk
@ 2016-09-27  8:11     ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2016-09-27  8:11 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel

>>> On 26.09.16 at 19:12, <roger.pau@citrix.com> wrote:
> On Mon, Sep 26, 2016 at 10:25:06AM -0600, Jan Beulich wrote:
>> >>> On 29.07.16 at 18:28, <roger.pau@citrix.com> wrote:
>> > This is a very rough implementation of a PVHv2 Dom0. There are still a bunch 
>> > of things that will not work properly (like SR-IOV, MSI, MSI-X...), but it 
>> > *should* be able to boot a Dom0 kernel and it doesn't require using PIRQs.
>> 
>> It's been a long time since you had posted this, but I thought I'd
>> take a coarse look nevertheless. Basic concept looks okay, but of
>> course there are many rough edges.
> 
> Yes, sorry, it took longer than expected.

Well, it's not you to be sorry but me (for taking so long to respond
at all).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2016-09-27  8:11 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-29 16:28 [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
2016-07-29 16:28 ` [PATCH RFC 01/12] x86/paging: introduce paging_set_allocation Roger Pau Monne
2016-07-29 16:47   ` Andrew Cooper
2016-08-02  9:47     ` Roger Pau Monne
2016-08-02 15:49       ` Roger Pau Monne
2016-08-02 16:12         ` Jan Beulich
2016-08-03 15:11           ` George Dunlap
2016-08-03 15:25             ` Jan Beulich
2016-08-03 15:28               ` George Dunlap
2016-08-03 15:37                 ` Jan Beulich
2016-08-03 15:59                   ` George Dunlap
2016-08-03 16:00                   ` Roger Pau Monne
2016-08-03 16:15                     ` Jan Beulich
2016-08-03 16:24                       ` Roger Pau Monne
2016-08-04  6:19                         ` Jan Beulich
2016-08-01  8:57   ` Tim Deegan
2016-07-29 16:28 ` [PATCH RFC 02/12] xen/x86: split the setup of Dom0 permissions to a function Roger Pau Monne
2016-07-29 16:28 ` [PATCH RFC 03/12] xen/x86: allow the emulated APICs to be enbled for the hardware domain Roger Pau Monne
2016-07-29 17:50   ` Andrew Cooper
2016-08-01 11:23     ` Roger Pau Monne
2016-07-29 16:28 ` [PATCH RFC 04/12] xen/x86: split Dom0 build into PV and PVHv2 Roger Pau Monne
2016-07-29 17:57   ` Andrew Cooper
2016-08-01 11:36     ` Roger Pau Monne
2016-08-04 18:28       ` Andrew Cooper
2016-07-29 16:29 ` [PATCH RFC 05/12] xen/x86: make print_e820_memory_map global Roger Pau Monne
2016-07-29 17:57   ` Andrew Cooper
2016-07-29 16:29 ` [PATCH RFC 06/12] xen/x86: populate PVHv2 Dom0 physical memory map Roger Pau Monne
2016-07-29 19:04   ` Andrew Cooper
2016-08-02  9:19     ` Roger Pau Monne
2016-08-04 18:43       ` Andrew Cooper
2016-08-05  9:40         ` Roger Pau Monne
2016-08-11 18:28           ` Andrew Cooper
2016-07-29 16:29 ` [PATCH RFC 07/12] xen/x86: parse Dom0 kernel for PVHv2 Roger Pau Monne
2016-09-26 16:16   ` Jan Beulich
2016-09-26 17:11     ` Roger Pau Monne
2016-07-29 16:29 ` [PATCH RFC 08/12] xen/x86: setup PVHv2 Dom0 CPUs Roger Pau Monne
2016-09-26 16:19   ` Jan Beulich
2016-09-26 17:05     ` Roger Pau Monne
2016-09-27  8:10       ` Jan Beulich
2016-07-29 16:29 ` [PATCH RFC 09/12] xen/x86: setup PVHv2 Dom0 ACPI tables Roger Pau Monne
2016-09-26 16:21   ` Jan Beulich
2016-07-29 16:29 ` [PATCH RFC 10/12] xen/dcpi: add a dpci passthrough handler for hardware domain Roger Pau Monne
2016-07-29 16:29 ` [PATCH RFC 11/12] xen/x86: allow a PVHv2 Dom0 to register PCI devices with Xen Roger Pau Monne
2016-07-29 16:29 ` [PATCH RFC 12/12] xen/x86: route legacy PCI interrupts to Dom0 Roger Pau Monne
2016-07-29 16:38 ` [PATCH RFC 01/12] PVHv2 Dom0 Roger Pau Monne
2016-09-26 16:25 ` Jan Beulich
2016-09-26 17:12   ` Roger Pau Monne
2016-09-26 17:55     ` Konrad Rzeszutek Wilk
2016-09-27  8:11     ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).