All of lore.kernel.org
 help / color / mirror / Atom feed
* [V8 PATCH 0/8] pvh dom0....
@ 2014-03-22  1:39 Mukesh Rathor
  2014-03-22  1:39 ` [V8 PATCH 1/8] pvh dom0: move some pv specific code to static functions Mukesh Rathor
                   ` (10 more replies)
  0 siblings, 11 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

Hi all,

Finally, please find V8 of dom0 PVH patches based on commit bc69aaf.

  git tree: git://oss.oracle.com/git/mrathor/xen.git  branch: dom0pvh-v8


Following changes from V7 :

Patch that added support for XENMEM_add_to_physmap_range is dropped.

Patch 2: comment changes (based on feedback from Konrad):

  -            /* Unused RAM areas are marked UNUSABLE, so skip it too */
  +            /* Unused RAM areas are marked UNUSABLE, so skip them too */

-    /* If the e820 ended under 4GB, we must map the remaining space upto 4GB */
+    /* 
+     * Some BIOSes may not report io space above ram that is less than 4GB. So
+     * we map any non-ram upto 4GB.
+     */

Jan, I retained Reviewed-by. If above voids the "reviewed-by", I'll just undo
the comment changes. Please lmk if that's the case.

Patch 3:

 Tim, the caller of set_foreign_p2m_entry got moved to the same file p2m.c 
 so I made the function static and removed it's prototype from p2m.h. No 
 other change. I retained your ack.


Caveat:

With these patches, an HVM guest during boot intermittantly takes
unresolved fault in load_module() path. This for PVH dom0 and not PV, and
latest upstream kernel. I was not able to reproduce with older stock 
fedora 19 kernel. So, it appears the problem might be on the guest side, 
and not xen. Hence I decided to go ahead and submit these while I debug that.


Pre-requisite:  xen patch "pvh: disallow PHYSDEVOP_pirq_eoi_gmfn_v2/v1"


thanks
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [V8 PATCH 1/8] pvh dom0: move some pv specific code to static functions
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-22  1:39 ` [V8 PATCH 2/8] pvh dom0: construct_dom0 changes Mukesh Rathor
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

In this preparatory patch, some pv specific code is
carved out into static functions. No functionality change.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain_build.c | 347 ++++++++++++++++++++++++--------------------
 1 file changed, 188 insertions(+), 159 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index 84ce392..f3ae9ad 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -307,6 +307,187 @@ static void __init process_dom0_ioports_disable(void)
     }
 }
 
+static __init void mark_pv_pt_pages_rdonly(struct domain *d,
+                                           l4_pgentry_t *l4start,
+                                           unsigned long vpt_start,
+                                           unsigned long nr_pt_pages)
+{
+    unsigned long count;
+    struct page_info *page;
+    l4_pgentry_t *pl4e;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    l1_pgentry_t *pl1e;
+
+    pl4e = l4start + l4_table_offset(vpt_start);
+    pl3e = l4e_to_l3e(*pl4e);
+    pl3e += l3_table_offset(vpt_start);
+    pl2e = l3e_to_l2e(*pl3e);
+    pl2e += l2_table_offset(vpt_start);
+    pl1e = l2e_to_l1e(*pl2e);
+    pl1e += l1_table_offset(vpt_start);
+    for ( count = 0; count < nr_pt_pages; count++ )
+    {
+        l1e_remove_flags(*pl1e, _PAGE_RW);
+        page = mfn_to_page(l1e_get_pfn(*pl1e));
+
+        /* Read-only mapping + PGC_allocated + page-table page. */
+        page->count_info         = PGC_allocated | 3;
+        page->u.inuse.type_info |= PGT_validated | 1;
+
+        /* Top-level p.t. is pinned. */
+        if ( (page->u.inuse.type_info & PGT_type_mask) ==
+             (!is_pv_32on64_domain(d) ?
+              PGT_l4_page_table : PGT_l3_page_table) )
+        {
+            page->count_info        += 1;
+            page->u.inuse.type_info += 1 | PGT_pinned;
+        }
+
+        /* Iterate. */
+        if ( !((unsigned long)++pl1e & (PAGE_SIZE - 1)) )
+        {
+            if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) )
+            {
+                if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) )
+                    pl3e = l4e_to_l3e(*++pl4e);
+                pl2e = l3e_to_l2e(*pl3e);
+            }
+            pl1e = l2e_to_l1e(*pl2e);
+        }
+    }
+}
+
+static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
+                                    unsigned long v_start, unsigned long v_end,
+                                    unsigned long vphysmap_start,
+                                    unsigned long vphysmap_end,
+                                    unsigned long nr_pages)
+{
+    struct page_info *page = NULL;
+    l4_pgentry_t *pl4e, *l4start = map_domain_page(pgtbl_pfn);
+    l3_pgentry_t *pl3e = NULL;
+    l2_pgentry_t *pl2e = NULL;
+    l1_pgentry_t *pl1e = NULL;
+
+    if ( v_start <= vphysmap_end && vphysmap_start <= v_end )
+        panic("DOM0 P->M table overlaps initial mapping");
+
+    while ( vphysmap_start < vphysmap_end )
+    {
+        if ( d->tot_pages + ((round_pgup(vphysmap_end) - vphysmap_start)
+                             >> PAGE_SHIFT) + 3 > nr_pages )
+            panic("Dom0 allocation too small for initial P->M table");
+
+        if ( pl1e )
+        {
+            unmap_domain_page(pl1e);
+            pl1e = NULL;
+        }
+        if ( pl2e )
+        {
+            unmap_domain_page(pl2e);
+            pl2e = NULL;
+        }
+        if ( pl3e )
+        {
+            unmap_domain_page(pl3e);
+            pl3e = NULL;
+        }
+        pl4e = l4start + l4_table_offset(vphysmap_start);
+        if ( !l4e_get_intpte(*pl4e) )
+        {
+            page = alloc_domheap_page(d, 0);
+            if ( !page )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l3_page_table | PGT_validated | 1;
+            pl3e = __map_domain_page(page);
+            clear_page(pl3e);
+            *pl4e = l4e_from_page(page, L4_PROT);
+        } else
+            pl3e = map_domain_page(l4e_get_pfn(*pl4e));
+
+        pl3e += l3_table_offset(vphysmap_start);
+        if ( !l3e_get_intpte(*pl3e) )
+        {
+            if ( cpu_has_page1gb &&
+                 !(vphysmap_start & ((1UL << L3_PAGETABLE_SHIFT) - 1)) &&
+                 vphysmap_end >= vphysmap_start + (1UL << L3_PAGETABLE_SHIFT) &&
+                 (page = alloc_domheap_pages(d,
+                                             L3_PAGETABLE_SHIFT - PAGE_SHIFT,
+                                             0)) != NULL )
+            {
+                *pl3e = l3e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
+                vphysmap_start += 1UL << L3_PAGETABLE_SHIFT;
+                continue;
+            }
+            if ( (page = alloc_domheap_page(d, 0)) == NULL )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l2_page_table | PGT_validated | 1;
+            pl2e = __map_domain_page(page);
+            clear_page(pl2e);
+            *pl3e = l3e_from_page(page, L3_PROT);
+        }
+        else
+           pl2e = map_domain_page(l3e_get_pfn(*pl3e));
+
+        pl2e += l2_table_offset(vphysmap_start);
+        if ( !l2e_get_intpte(*pl2e) )
+        {
+            if ( !(vphysmap_start & ((1UL << L2_PAGETABLE_SHIFT) - 1)) &&
+                 vphysmap_end >= vphysmap_start + (1UL << L2_PAGETABLE_SHIFT) &&
+                 (page = alloc_domheap_pages(d,
+                                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
+                                             0)) != NULL )
+            {
+                *pl2e = l2e_from_page(page, L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
+                if ( opt_allow_superpage )
+                    get_superpage(page_to_mfn(page), d);
+                vphysmap_start += 1UL << L2_PAGETABLE_SHIFT;
+                continue;
+            }
+            if ( (page = alloc_domheap_page(d, 0)) == NULL )
+                break;
+
+            /* No mapping, PGC_allocated + page-table page. */
+            page->count_info = PGC_allocated | 2;
+            page->u.inuse.type_info = PGT_l1_page_table | PGT_validated | 1;
+            pl1e = __map_domain_page(page);
+            clear_page(pl1e);
+            *pl2e = l2e_from_page(page, L2_PROT);
+        }
+        else
+            pl1e = map_domain_page(l2e_get_pfn(*pl2e));
+
+        pl1e += l1_table_offset(vphysmap_start);
+        BUG_ON(l1e_get_intpte(*pl1e));
+        page = alloc_domheap_page(d, 0);
+        if ( !page )
+            break;
+
+        *pl1e = l1e_from_page(page, L1_PROT|_PAGE_DIRTY);
+        vphysmap_start += PAGE_SIZE;
+        vphysmap_start &= PAGE_MASK;
+    }
+    if ( !page )
+        panic("Not enough RAM for DOM0 P->M table");
+
+    if ( pl1e )
+        unmap_domain_page(pl1e);
+    if ( pl2e )
+        unmap_domain_page(pl2e);
+    if ( pl3e )
+        unmap_domain_page(pl3e);
+
+    unmap_domain_page(l4start);
+}
+
 int __init construct_dom0(
     struct domain *d,
     const module_t *image, unsigned long image_headroom,
@@ -706,43 +887,8 @@ int __init construct_dom0(
     }
 
     /* Pages that are part of page tables must be read only. */
-    l4tab = l4start + l4_table_offset(vpt_start);
-    l3start = l3tab = l4e_to_l3e(*l4tab);
-    l3tab += l3_table_offset(vpt_start);
-    l2start = l2tab = l3e_to_l2e(*l3tab);
-    l2tab += l2_table_offset(vpt_start);
-    l1start = l1tab = l2e_to_l1e(*l2tab);
-    l1tab += l1_table_offset(vpt_start);
-    for ( count = 0; count < nr_pt_pages; count++ ) 
-    {
-        l1e_remove_flags(*l1tab, _PAGE_RW);
-        page = mfn_to_page(l1e_get_pfn(*l1tab));
-
-        /* Read-only mapping + PGC_allocated + page-table page. */
-        page->count_info         = PGC_allocated | 3;
-        page->u.inuse.type_info |= PGT_validated | 1;
-
-        /* Top-level p.t. is pinned. */
-        if ( (page->u.inuse.type_info & PGT_type_mask) ==
-             (!is_pv_32on64_domain(d) ?
-              PGT_l4_page_table : PGT_l3_page_table) )
-        {
-            page->count_info        += 1;
-            page->u.inuse.type_info += 1 | PGT_pinned;
-        }
-
-        /* Iterate. */
-        if ( !((unsigned long)++l1tab & (PAGE_SIZE - 1)) )
-        {
-            if ( !((unsigned long)++l2tab & (PAGE_SIZE - 1)) )
-            {
-                if ( !((unsigned long)++l3tab & (PAGE_SIZE - 1)) )
-                    l3start = l3tab = l4e_to_l3e(*++l4tab);
-                l2start = l2tab = l3e_to_l2e(*l3tab);
-            }
-            l1start = l1tab = l2e_to_l1e(*l2tab);
-        }
-    }
+    if  ( is_pv_domain(d) )
+        mark_pv_pt_pages_rdonly(d, l4start, vpt_start, nr_pt_pages);
 
     /* Mask all upcalls... */
     for ( i = 0; i < XEN_LEGACY_MAX_VCPUS; i++ )
@@ -814,132 +960,15 @@ int __init construct_dom0(
              elf_64bit(&elf) ? 64 : 32, parms.pae ? "p" : "");
 
     count = d->tot_pages;
-    l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table));
-    l3tab = NULL;
-    l2tab = NULL;
-    l1tab = NULL;
+
     /* Set up the phys->machine table if not part of the initial mapping. */
-    if ( parms.p2m_base != UNSET_ADDR )
+    if ( is_pv_domain(d) && parms.p2m_base != UNSET_ADDR )
     {
-        unsigned long va = vphysmap_start;
-
-        if ( v_start <= vphysmap_end && vphysmap_start <= v_end )
-            panic("DOM0 P->M table overlaps initial mapping");
-
-        while ( va < vphysmap_end )
-        {
-            if ( d->tot_pages + ((round_pgup(vphysmap_end) - va)
-                                 >> PAGE_SHIFT) + 3 > nr_pages )
-                panic("Dom0 allocation too small for initial P->M table");
-
-            if ( l1tab )
-            {
-                unmap_domain_page(l1tab);
-                l1tab = NULL;
-            }
-            if ( l2tab )
-            {
-                unmap_domain_page(l2tab);
-                l2tab = NULL;
-            }
-            if ( l3tab )
-            {
-                unmap_domain_page(l3tab);
-                l3tab = NULL;
-            }
-            l4tab = l4start + l4_table_offset(va);
-            if ( !l4e_get_intpte(*l4tab) )
-            {
-                page = alloc_domheap_page(d, 0);
-                if ( !page )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info =
-                    PGT_l3_page_table | PGT_validated | 1;
-                l3tab = __map_domain_page(page);
-                clear_page(l3tab);
-                *l4tab = l4e_from_page(page, L4_PROT);
-            } else
-                l3tab = map_domain_page(l4e_get_pfn(*l4tab));
-            l3tab += l3_table_offset(va);
-            if ( !l3e_get_intpte(*l3tab) )
-            {
-                if ( cpu_has_page1gb &&
-                     !(va & ((1UL << L3_PAGETABLE_SHIFT) - 1)) &&
-                     vphysmap_end >= va + (1UL << L3_PAGETABLE_SHIFT) &&
-                     (page = alloc_domheap_pages(d,
-                                                 L3_PAGETABLE_SHIFT -
-                                                     PAGE_SHIFT,
-                                                 0)) != NULL )
-                {
-                    *l3tab = l3e_from_page(page,
-                                           L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
-                    va += 1UL << L3_PAGETABLE_SHIFT;
-                    continue;
-                }
-                if ( (page = alloc_domheap_page(d, 0)) == NULL )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info =
-                    PGT_l2_page_table | PGT_validated | 1;
-                l2tab = __map_domain_page(page);
-                clear_page(l2tab);
-                *l3tab = l3e_from_page(page, L3_PROT);
-            }
-            else
-               l2tab = map_domain_page(l3e_get_pfn(*l3tab));
-            l2tab += l2_table_offset(va);
-            if ( !l2e_get_intpte(*l2tab) )
-            {
-                if ( !(va & ((1UL << L2_PAGETABLE_SHIFT) - 1)) &&
-                     vphysmap_end >= va + (1UL << L2_PAGETABLE_SHIFT) &&
-                     (page = alloc_domheap_pages(d,
-                                                 L2_PAGETABLE_SHIFT -
-                                                     PAGE_SHIFT,
-                                                 0)) != NULL )
-                {
-                    *l2tab = l2e_from_page(page,
-                                           L1_PROT|_PAGE_DIRTY|_PAGE_PSE);
-                    if ( opt_allow_superpage )
-                        get_superpage(page_to_mfn(page), d);
-                    va += 1UL << L2_PAGETABLE_SHIFT;
-                    continue;
-                }
-                if ( (page = alloc_domheap_page(d, 0)) == NULL )
-                    break;
-                /* No mapping, PGC_allocated + page-table page. */
-                page->count_info = PGC_allocated | 2;
-                page->u.inuse.type_info =
-                    PGT_l1_page_table | PGT_validated | 1;
-                l1tab = __map_domain_page(page);
-                clear_page(l1tab);
-                *l2tab = l2e_from_page(page, L2_PROT);
-            }
-            else
-                l1tab = map_domain_page(l2e_get_pfn(*l2tab));
-            l1tab += l1_table_offset(va);
-            BUG_ON(l1e_get_intpte(*l1tab));
-            page = alloc_domheap_page(d, 0);
-            if ( !page )
-                break;
-            *l1tab = l1e_from_page(page, L1_PROT|_PAGE_DIRTY);
-            va += PAGE_SIZE;
-            va &= PAGE_MASK;
-        }
-        if ( !page )
-            panic("Not enough RAM for DOM0 P->M table");
+        pfn = pagetable_get_pfn(v->arch.guest_table);
+        setup_pv_physmap(d, pfn, v_start, v_end, vphysmap_start, vphysmap_end,
+                         nr_pages);
     }
 
-    if ( l1tab )
-        unmap_domain_page(l1tab);
-    if ( l2tab )
-        unmap_domain_page(l2tab);
-    if ( l3tab )
-        unmap_domain_page(l3tab);
-    unmap_domain_page(l4start);
-
     /* Write the phys->machine and machine->phys table entries. */
     for ( pfn = 0; pfn < count; pfn++ )
     {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
  2014-03-22  1:39 ` [V8 PATCH 1/8] pvh dom0: move some pv specific code to static functions Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-26 19:05   ` George Dunlap
  2014-03-22  1:39 ` [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign Mukesh Rathor
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

This patch changes construct_dom0() to boot in PVH mode.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/domain_build.c | 238 ++++++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/mm/hap/hap.c   |  15 +++
 xen/include/asm-x86/hap.h   |   1 +
 3 files changed, 237 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c
index f3ae9ad..6986ea6 100644
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -35,6 +35,7 @@
 #include <asm/setup.h>
 #include <asm/bzimage.h> /* for bzimage_parse */
 #include <asm/io_apic.h>
+#include <asm/hap.h>
 
 #include <public/version.h>
 
@@ -307,6 +308,154 @@ static void __init process_dom0_ioports_disable(void)
     }
 }
 
+static __init void pvh_add_mem_mapping(struct domain *d, unsigned long gfn,
+                                       unsigned long mfn, unsigned long nr_mfns)
+{
+    unsigned long i;
+    for ( i = 0; i < nr_mfns; i++ )
+        if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
+            panic("Failed setting p2m. gfn:%lx mfn:%lx i:%ld\n", gfn, mfn, i);
+}
+
+/*
+ * Set the 1:1 map for all non-RAM regions for dom 0. Thus, dom0 will have
+ * the entire io region mapped in the EPT/NPT.
+ *
+ * pvh fixme: The following doesn't map MMIO ranges when they sit above the
+ *            highest E820 covered address.
+ */
+static __init void pvh_map_all_iomem(struct domain *d)
+{
+    unsigned long start_pfn, end_pfn, end = 0, start = 0;
+    const struct e820entry *entry;
+    unsigned int i, nump;
+
+    for ( i = 0, entry = e820.map; i < e820.nr_map; i++, entry++ )
+    {
+        end = entry->addr + entry->size;
+
+        if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE ||
+             i == e820.nr_map - 1 )
+        {
+            start_pfn = PFN_DOWN(start);
+
+            /* Unused RAM areas are marked UNUSABLE, so skip them too */
+            if ( entry->type == E820_RAM || entry->type == E820_UNUSABLE )
+                end_pfn = PFN_UP(entry->addr);
+            else
+                end_pfn = PFN_UP(end);
+
+            if ( start_pfn < end_pfn )
+            {
+                nump = end_pfn - start_pfn;
+                /* Add pages to the mapping */
+                pvh_add_mem_mapping(d, start_pfn, start_pfn, nump);
+            }
+            start = end;
+        }
+    }
+
+    /* 
+     * Some BIOSes may not report io space above ram that is less than 4GB. So
+     * we map any non-ram upto 4GB.
+     */
+    if ( end < GB(4) )
+    {
+        start_pfn = PFN_UP(end);
+        end_pfn = (GB(4)) >> PAGE_SHIFT;
+        nump = end_pfn - start_pfn;
+        pvh_add_mem_mapping(d, start_pfn, start_pfn, nump);
+    }
+}
+
+static __init void dom0_update_physmap(struct domain *d, unsigned long pfn,
+                                   unsigned long mfn, unsigned long vphysmap_s)
+{
+    if ( is_pvh_domain(d) )
+    {
+        int rc = guest_physmap_add_page(d, pfn, mfn, 0);
+        BUG_ON(rc);
+        return;
+    }
+    if ( !is_pv_32on64_domain(d) )
+        ((unsigned long *)vphysmap_s)[pfn] = mfn;
+    else
+        ((unsigned int *)vphysmap_s)[pfn] = mfn;
+
+    set_gpfn_from_mfn(mfn, pfn);
+}
+
+static __init void pvh_fixup_page_tables_for_hap(struct vcpu *v,
+                                                 unsigned long v_start,
+                                                 unsigned long v_end)
+{
+    int i, j, k;
+    l4_pgentry_t *pl4e, *l4start;
+    l3_pgentry_t *pl3e;
+    l2_pgentry_t *pl2e;
+    l1_pgentry_t *pl1e;
+    unsigned long cr3_pfn;
+
+    ASSERT(paging_mode_enabled(v->domain));
+
+    l4start = map_domain_page(pagetable_get_pfn(v->arch.guest_table));
+
+    /* Clear entries prior to guest L4 start */
+    pl4e = l4start + l4_table_offset(v_start);
+    memset(l4start, 0, (unsigned long)pl4e - (unsigned long)l4start);
+
+    for ( ; pl4e <= l4start + l4_table_offset(v_end - 1); pl4e++ )
+    {
+        pl3e = map_l3t_from_l4e(*pl4e);
+        for ( i = 0; i < PAGE_SIZE / sizeof(*pl3e); i++, pl3e++ )
+        {
+            if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
+                continue;
+
+            pl2e = map_l2t_from_l3e(*pl3e);
+            for ( j = 0; j < PAGE_SIZE / sizeof(*pl2e); j++, pl2e++ )
+            {
+                if ( !(l2e_get_flags(*pl2e)  & _PAGE_PRESENT) )
+                    continue;
+
+                pl1e = map_l1t_from_l2e(*pl2e);
+                for ( k = 0; k < PAGE_SIZE / sizeof(*pl1e); k++, pl1e++ )
+                {
+                    if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) )
+                        continue;
+
+                    *pl1e = l1e_from_pfn(get_gpfn_from_mfn(l1e_get_pfn(*pl1e)),
+                                         l1e_get_flags(*pl1e));
+                }
+                unmap_domain_page(pl1e);
+                *pl2e = l2e_from_pfn(get_gpfn_from_mfn(l2e_get_pfn(*pl2e)),
+                                     l2e_get_flags(*pl2e));
+            }
+            unmap_domain_page(pl2e);
+            *pl3e = l3e_from_pfn(get_gpfn_from_mfn(l3e_get_pfn(*pl3e)),
+                                 l3e_get_flags(*pl3e));
+        }
+        unmap_domain_page(pl3e);
+        *pl4e = l4e_from_pfn(get_gpfn_from_mfn(l4e_get_pfn(*pl4e)),
+                             l4e_get_flags(*pl4e));
+    }
+
+    /* Clear entries post guest L4. */
+    if ( (unsigned long)pl4e & (PAGE_SIZE - 1) )
+        memset(pl4e, 0, PAGE_SIZE - ((unsigned long)pl4e & (PAGE_SIZE - 1)));
+
+    unmap_domain_page(l4start);
+
+    cr3_pfn = get_gpfn_from_mfn(paddr_to_pfn(v->arch.cr3));
+    v->arch.hvm_vcpu.guest_cr[3] = pfn_to_paddr(cr3_pfn);
+
+    /*
+     * Finally, we update the paging modes (hap_update_paging_modes). This will
+     * create monitor_table for us, update v->arch.cr3, and update vmcs.cr3.
+     */
+    paging_update_paging_modes(v);
+}
+
 static __init void mark_pv_pt_pages_rdonly(struct domain *d,
                                            l4_pgentry_t *l4start,
                                            unsigned long vpt_start,
@@ -516,6 +665,8 @@ int __init construct_dom0(
     l3_pgentry_t *l3tab = NULL, *l3start = NULL;
     l2_pgentry_t *l2tab = NULL, *l2start = NULL;
     l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+    paddr_t shared_info_paddr = 0;
+    u32 save_pvh_pg_mode = 0;
 
     /*
      * This fully describes the memory layout of the initial domain. All 
@@ -593,12 +744,21 @@ int __init construct_dom0(
         goto out;
     }
 
-    if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE &&
-         !test_bit(XENFEAT_dom0, parms.f_supported) )
+    if ( parms.elf_notes[XEN_ELFNOTE_SUPPORTED_FEATURES].type != XEN_ENT_NONE )
     {
-        printk("Kernel does not support Dom0 operation\n");
-        rc = -EINVAL;
-        goto out;
+        if ( !test_bit(XENFEAT_dom0, parms.f_supported) )
+        {
+            printk("Kernel does not support Dom0 operation\n");
+            rc = -EINVAL;
+            goto out;
+        }
+        if ( is_pvh_domain(d) &&
+             !test_bit(XENFEAT_hvm_callback_vector, parms.f_supported) )
+        {
+            printk("Kernel does not support PVH mode\n");
+            rc = -EINVAL;
+            goto out;
+        }
     }
 
     if ( compat32 )
@@ -663,6 +823,13 @@ int __init construct_dom0(
     vstartinfo_end   = (vstartinfo_start +
                         sizeof(struct start_info) +
                         sizeof(struct dom0_vga_console_info));
+
+    if ( is_pvh_domain(d) )
+    {
+        shared_info_paddr = round_pgup(vstartinfo_end) - v_start;
+        vstartinfo_end   += PAGE_SIZE;
+    }
+
     vpt_start        = round_pgup(vstartinfo_end);
     for ( nr_pt_pages = 2; ; nr_pt_pages++ )
     {
@@ -903,6 +1070,13 @@ int __init construct_dom0(
         (void)alloc_vcpu(d, i, cpu);
     }
 
+    /*
+     * pvh: we temporarily disable paging mode so that we can build cr3 needed
+     * to run on dom0's page tables.
+     */
+    save_pvh_pg_mode = d->arch.paging.mode;
+    d->arch.paging.mode = 0;
+
     /* Set up CR3 value for write_ptbase */
     if ( paging_mode_enabled(d) )
         paging_update_paging_modes(v);
@@ -969,6 +1143,15 @@ int __init construct_dom0(
                          nr_pages);
     }
 
+    if ( is_pvh_domain(d) )
+        hap_set_pvh_alloc_for_dom0(d, nr_pages);
+
+    /*
+     * We enable paging mode again so guest_physmap_add_page will do the
+     * right thing for us.
+     */
+    d->arch.paging.mode = save_pvh_pg_mode;
+
     /* Write the phys->machine and machine->phys table entries. */
     for ( pfn = 0; pfn < count; pfn++ )
     {
@@ -985,11 +1168,7 @@ int __init construct_dom0(
         if ( pfn > REVERSE_START && (vinitrd_start || pfn < initrd_pfn) )
             mfn = alloc_epfn - (pfn - REVERSE_START);
 #endif
-        if ( !is_pv_32on64_domain(d) )
-            ((unsigned long *)vphysmap_start)[pfn] = mfn;
-        else
-            ((unsigned int *)vphysmap_start)[pfn] = mfn;
-        set_gpfn_from_mfn(mfn, pfn);
+        dom0_update_physmap(d, pfn, mfn, vphysmap_start);
         if (!(pfn & 0xfffff))
             process_pending_softirqs();
     }
@@ -1005,8 +1184,8 @@ int __init construct_dom0(
             if ( !page->u.inuse.type_info &&
                  !get_page_and_type(page, d, PGT_writable_page) )
                 BUG();
-            ((unsigned long *)vphysmap_start)[pfn] = mfn;
-            set_gpfn_from_mfn(mfn, pfn);
+
+            dom0_update_physmap(d, pfn, mfn, vphysmap_start);
             ++pfn;
             if (!(pfn & 0xfffff))
                 process_pending_softirqs();
@@ -1026,11 +1205,7 @@ int __init construct_dom0(
 #ifndef NDEBUG
 #define pfn (nr_pages - 1 - (pfn - (alloc_epfn - alloc_spfn)))
 #endif
-            if ( !is_pv_32on64_domain(d) )
-                ((unsigned long *)vphysmap_start)[pfn] = mfn;
-            else
-                ((unsigned int *)vphysmap_start)[pfn] = mfn;
-            set_gpfn_from_mfn(mfn, pfn);
+            dom0_update_physmap(d, pfn, mfn, vphysmap_start);
 #undef pfn
             page++; pfn++;
             if (!(pfn & 0xfffff))
@@ -1054,6 +1229,15 @@ int __init construct_dom0(
         si->console.dom0.info_size = sizeof(struct dom0_vga_console_info);
     }
 
+    /*
+     * PVH: We need to update si->shared_info while we are on dom0 page tables,
+     * but need to defer the p2m update until after we have fixed up the
+     * page tables for PVH so that the m2p for the si pte entry returns
+     * correct pfn.
+     */
+    if ( is_pvh_domain(d) )
+        si->shared_info = shared_info_paddr;
+
     if ( is_pv_32on64_domain(d) )
         xlat_start_info(si, XLAT_start_info_console_dom0);
 
@@ -1087,8 +1271,15 @@ int __init construct_dom0(
     regs->eflags = X86_EFLAGS_IF;
 
     if ( opt_dom0_shadow )
+    {
+        if ( is_pvh_domain(d) )
+        {
+            printk("Unsupported option dom0_shadow for PVH\n");
+            return -EINVAL;
+        }
         if ( paging_enable(d, PG_SH_enable) == 0 ) 
             paging_update_paging_modes(v);
+    }
 
     if ( supervisor_mode_kernel )
     {
@@ -1178,6 +1369,19 @@ int __init construct_dom0(
         printk(" Xen warning: dom0 kernel broken ELF: %s\n",
                elf_check_broken(&elf));
 
+    if ( is_pvh_domain(d) )
+    {
+        /* finally, fixup the page table, replacing mfns with pfns */
+        pvh_fixup_page_tables_for_hap(v, v_start, v_end);
+
+        /* the pt has correct pfn for si, now update the mfn in the p2m */
+        mfn = virt_to_mfn(d->shared_info);
+        pfn = shared_info_paddr >> PAGE_SHIFT;
+        dom0_update_physmap(d, pfn, mfn, 0);
+
+        pvh_map_all_iomem(d);
+    }
+
     iommu_dom0_init(dom0);
     return 0;
 
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index b8c5422..878697e 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
     }
 }
 
+void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
+                                       unsigned long num_pages)
+{
+    int rc;
+    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
+
+    /* Copied from: libxl_get_required_shadow_memory() */
+    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
+    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
+    paging_lock(d);
+    rc = hap_set_allocation(d, num_pages, NULL);
+    paging_unlock(d);
+    BUG_ON(rc);
+}
+
 static const struct paging_mode hap_paging_real_mode;
 static const struct paging_mode hap_paging_protected_mode;
 static const struct paging_mode hap_paging_pae_mode;
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index e03f983..aab8558 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -63,6 +63,7 @@ int   hap_track_dirty_vram(struct domain *d,
                            XEN_GUEST_HANDLE_64(uint8) dirty_bitmap);
 
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
+void hap_set_pvh_alloc_for_dom0(struct domain *d, unsigned long num_pages);
 
 #endif /* XEN_HAP_H */
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
  2014-03-22  1:39 ` [V8 PATCH 1/8] pvh dom0: move some pv specific code to static functions Mukesh Rathor
  2014-03-22  1:39 ` [V8 PATCH 2/8] pvh dom0: construct_dom0 changes Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-24  9:00   ` Jan Beulich
  2014-03-27 12:29   ` George Dunlap
  2014-03-22  1:39 ` [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86 Mukesh Rathor
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

In this patch, a new type p2m_map_foreign is introduced for pages
that toolstack on an auto translated dom0 or a control domain maps
from foreign domains that its creating or supporting during it's
run time.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/mm/p2m-ept.c |  1 +
 xen/arch/x86/mm/p2m-pt.c  |  1 +
 xen/arch/x86/mm/p2m.c     | 30 ++++++++++++++++++++++--------
 xen/include/asm-x86/p2m.h |  2 ++
 4 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 92d9e2d..08d1d72 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -75,6 +75,7 @@ static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_acces
             entry->w = 0;
             break;
         case p2m_grant_map_rw:
+        case p2m_map_foreign:
             entry->r = entry->w = 1;
             entry->x = 0;
             break;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index a1d5650..09b60ce 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -89,6 +89,7 @@ static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
     case p2m_ram_rw:
         return flags | P2M_BASE_FLAGS | _PAGE_RW;
     case p2m_grant_map_rw:
+    case p2m_map_foreign:
         return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_NX_BIT;
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index c0ddef0..7050f6a 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -492,7 +492,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
         for ( i = 0; i < (1UL << page_order); i++ )
         {
             mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL);
-            if ( !p2m_is_grant(t) && !p2m_is_shared(t) )
+            if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )
                 set_gpfn_from_mfn(mfn+i, INVALID_M2P_ENTRY);
             ASSERT( !p2m_is_valid(t) || mfn + i == mfn_x(mfn_return) );
         }
@@ -723,10 +723,9 @@ void p2m_change_type_range(struct domain *d,
     p2m_unlock(p2m);
 }
 
-
-
-int
-set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+static int
+set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
+                    p2m_type_t gfn_p2mt)
 {
     int rc = 0;
     p2m_access_t a;
@@ -751,16 +750,31 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
     }
 
-    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
-    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access);
+    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
+    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
+                       p2m->default_access);
     gfn_unlock(p2m, gfn, 0);
     if ( 0 == rc )
         gdprintk(XENLOG_ERR,
-            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
+            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
             mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
     return rc;
 }
 
+/* Set foreign mfn in the given guest's p2m table.
+ * Returns: True for success. */
+static int __attribute__((unused))
+set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_map_foreign);
+}
+
+int
+set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
+{
+    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct);
+}
+
 int
 clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index f4e7253..ad2383e 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -70,6 +70,7 @@ typedef enum {
     p2m_ram_paging_in = 11,       /* Memory that is being paged in */
     p2m_ram_shared = 12,          /* Shared or sharable memory */
     p2m_ram_broken = 13,          /* Broken page, access cause domain crash */
+    p2m_map_foreign  = 14,        /* ram pages from foreign domain */
 } p2m_type_t;
 
 /*
@@ -180,6 +181,7 @@ typedef unsigned int p2m_query_t;
 #define p2m_is_sharable(_t) (p2m_to_mask(_t) & P2M_SHARABLE_TYPES)
 #define p2m_is_shared(_t)   (p2m_to_mask(_t) & P2M_SHARED_TYPES)
 #define p2m_is_broken(_t)   (p2m_to_mask(_t) & P2M_BROKEN_TYPES)
+#define p2m_is_foreign(_t)  (p2m_to_mask(_t) & p2m_to_mask(p2m_map_foreign))
 
 /* Per-p2m-table state */
 struct p2m_domain {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (2 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-25 17:53   ` Daniel De Graaf
  2014-03-22  1:39 ` [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages Mukesh Rathor
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, julien.grall, tim, keir.xen, JBeulich, dgdegra

In this patch we make xsm_map_gmfn_foreign available for x86 also. This
is used in the next patch "pvh dom0: Add and remove foreign pages" in
function p2m_add_foreign.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/include/xsm/dummy.h | 2 --
 xen/include/xsm/xsm.h   | 4 ----
 xen/xsm/dummy.c         | 2 --
 xen/xsm/flask/hooks.c   | 4 ----
 4 files changed, 12 deletions(-)

diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 3bcd941..eb7aeb8 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -622,10 +622,8 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
 
 #endif /* CONFIG_X86 */
 
-#ifdef CONFIG_ARM
 static XSM_INLINE int xsm_map_gmfn_foreign(XSM_DEFAULT_ARG struct domain *d, struct domain *t)
 {
     XSM_ASSERT_ACTION(XSM_TARGET);
     return xsm_default_action(action, d, t);
 }
-#endif
diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
index de9cf86..6281e29 100644
--- a/xen/include/xsm/xsm.h
+++ b/xen/include/xsm/xsm.h
@@ -164,9 +164,7 @@ struct xsm_operations {
     int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
     int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
 #endif
-#ifdef CONFIG_ARM
     int (*map_gmfn_foreign) (struct domain *d, struct domain *t);
-#endif
 };
 
 #ifdef XSM_ENABLE
@@ -630,12 +628,10 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
 }
 #endif /* CONFIG_X86 */
 
-#ifdef CONFIG_ARM
 static inline int xsm_map_gmfn_foreign (xsm_default_t def, struct domain *d, struct domain *t)
 {
     return xsm_ops->map_gmfn_foreign(d, t);
 }
-#endif /* CONFIG_ARM */
 
 #endif /* XSM_NO_WRAPPERS */
 
diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
index 3fe4c59..0234490 100644
--- a/xen/xsm/dummy.c
+++ b/xen/xsm/dummy.c
@@ -134,7 +134,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
     set_to_dummy_if_null(ops, ioport_permission);
     set_to_dummy_if_null(ops, ioport_mapping);
 #endif
-#ifdef CONFIG_ARM
     set_to_dummy_if_null(ops, map_gmfn_foreign);
-#endif
 }
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 96276ac..df92953 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -1450,12 +1450,10 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
 }
 #endif /* CONFIG_X86 */
 
-#ifdef CONFIG_ARM
 static int flask_map_gmfn_foreign(struct domain *d, struct domain *t)
 {
     return domain_has_perm(d, t, SECCLASS_MMU, MMU__MAP_READ | MMU__MAP_WRITE);
 }
-#endif
 
 long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
 int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
@@ -1566,9 +1564,7 @@ static struct xsm_operations flask_ops = {
     .ioport_permission = flask_ioport_permission,
     .ioport_mapping = flask_ioport_mapping,
 #endif
-#ifdef CONFIG_ARM
     .map_gmfn_foreign = flask_map_gmfn_foreign,
-#endif
 };
 
 static __init int flask_init(void)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (3 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86 Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-24  9:26   ` Jan Beulich
  2014-03-22  1:39 ` [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains Mukesh Rathor
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel
  Cc: JBeulich, George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima

In this patch, a new function, p2m_add_foreign(), is added
to map pages from foreign guest into dom0 for domU creation.
Such pages are typed p2m_map_foreign.  Note, it is the nature
of such pages that a refcnt is held during their stay in the p2m.
The refcnt is added and released in the low level ept function
atomic_write_ept_entry for convenience.  Finally, p2m_teardown adds a
call to allow for the cleanup of foreign pages during it's destruction.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c         |  5 +++
 xen/arch/x86/mm/p2m-ept.c | 34 ++++++++++++++---
 xen/arch/x86/mm/p2m-pt.c  |  9 ++++-
 xen/arch/x86/mm/p2m.c     | 95 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-x86/p2m.h |  3 ++
 5 files changed, 135 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fdc5ed3..fcd3178 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4583,6 +4583,11 @@ int xenmem_add_to_physmap_one(
             page = mfn_to_page(mfn);
             break;
         }
+        case XENMAPSPACE_gmfn_foreign:
+        {
+            rc = p2m_add_foreign(d, idx, gpfn, foreign_domid);
+            return rc;
+        }
         default:
             break;
     }
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 08d1d72..beb7288 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -36,8 +36,6 @@
 
 #define atomic_read_ept_entry(__pepte)                              \
     ( (ept_entry_t) { .epte = read_atomic(&(__pepte)->epte) } )
-#define atomic_write_ept_entry(__pepte, __epte)                     \
-    write_atomic(&(__pepte)->epte, (__epte).epte)
 
 #define is_epte_present(ept_entry)      ((ept_entry)->epte & 0x7)
 #define is_epte_superpage(ept_entry)    ((ept_entry)->sp)
@@ -46,6 +44,30 @@ static inline bool_t is_epte_valid(ept_entry_t *e)
     return (e->epte != 0 && e->sa_p2mt != p2m_invalid);
 }
 
+static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
+                                          const ept_entry_t *new)
+{
+    unsigned long oldmfn = 0;
+
+    if ( p2m_is_foreign(new->sa_p2mt) )
+    {
+        struct page_info *page;
+        struct domain *fdom;
+
+        ASSERT(mfn_valid(new->mfn));
+        page = mfn_to_page(new->mfn);
+        fdom = page_get_owner(page);
+        get_page(page, fdom);
+    }
+    if ( p2m_is_foreign(entryptr->sa_p2mt) )
+        oldmfn = entryptr->mfn;
+
+    write_atomic(&entryptr->epte, new->epte);
+
+    if ( oldmfn )
+        put_page(mfn_to_page(oldmfn));
+}
+
 static void ept_p2m_type_to_flags(ept_entry_t *entry, p2m_type_t type, p2m_access_t access)
 {
     /* First apply type permissions */
@@ -378,7 +400,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
             ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
         }
 
-        atomic_write_ept_entry(ept_entry, new_entry);
+        atomic_write_ept_entry(ept_entry, &new_entry);
     }
     else
     {
@@ -398,7 +420,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         /* now install the newly split ept sub-tree */
         /* NB: please make sure domian is paused and no in-fly VT-d DMA. */
-        atomic_write_ept_entry(ept_entry, split_ept_entry);
+        atomic_write_ept_entry(ept_entry, &split_ept_entry);
 
         /* then move to the level we want to make real changes */
         for ( ; i > target; i-- )
@@ -428,7 +450,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
 
         ept_p2m_type_to_flags(&new_entry, p2mt, p2ma);
 
-        atomic_write_ept_entry(ept_entry, new_entry);
+        atomic_write_ept_entry(ept_entry, &new_entry);
     }
 
     /* Track the highest gfn for which we have ever had a valid mapping */
@@ -665,7 +687,7 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn, int ept_page_level,
 
             e.sa_p2mt = nt;
             ept_p2m_type_to_flags(&e, nt, e.access);
-            atomic_write_ept_entry(&epte[i], e);
+            atomic_write_ept_entry(&epte[i], &e);
         }
     }
 
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 09b60ce..766fd67 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -276,7 +276,7 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
     return 1;
 }
 
-// Returns 0 on error (out of memory)
+// Returns 0 on error (out of memory, or unimplemented p2m type)
 static int
 p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
               unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
@@ -312,6 +312,13 @@ p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         __trace_var(TRC_MEM_SET_P2M_ENTRY, 0, sizeof(t), &t);
     }
 
+    if ( p2m_is_foreign(p2mt) )
+    {
+        /* pvh fixme: foreign types are only supported on ept at present */
+        gdprintk(XENLOG_WARNING, "Unimplemented foreign p2m type.\n");
+        return 0;
+    }
+
     if ( !p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
                          L4_PAGETABLE_SHIFT - PAGE_SHIFT,
                          L4_PAGETABLE_ENTRIES, PGT_l3_page_table) )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 7050f6a..52dd1e4 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -36,6 +36,7 @@
 #include <xen/event.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/svm/amd-iommu-proto.h>
+#include <xsm/xsm.h>
 
 #include "mm-locks.h"
 
@@ -275,14 +276,19 @@ struct page_info *get_page_from_gfn_p2m(
         /* Fast path: look up and get out */
         p2m_read_lock(p2m);
         mfn = __get_gfn_type_access(p2m, gfn, t, a, 0, NULL, 0);
-        if ( (p2m_is_ram(*t) || p2m_is_grant(*t))
+        if ( (p2m_is_ram(*t) || p2m_is_grant(*t) || p2m_is_foreign(*t) )
              && mfn_valid(mfn)
              && !((q & P2M_UNSHARE) && p2m_is_shared(*t)) )
         {
             page = mfn_to_page(mfn);
-            if ( !get_page(page, d)
-                 /* Page could be shared */
-                 && !get_page(page, dom_cow) )
+            if ( p2m_is_foreign(*t) )
+            {
+                struct domain *fdom = page_get_owner_and_reference(page);
+                ASSERT(fdom && fdom != d);
+            }
+            else if ( !get_page(page, d)
+                      /* Page could be shared */
+                      && !get_page(page, dom_cow) )
                 page = NULL;
         }
         p2m_read_unlock(p2m);
@@ -447,6 +453,10 @@ void p2m_teardown(struct p2m_domain *p2m)
     d = p2m->domain;
 
     p2m_lock(p2m);
+    /* pvh: we must release refcnt on all foreign pages */
+    if ( is_pvh_domain(d) )
+        p2m_change_entry_type_global(d, p2m_map_foreign, p2m_invalid);
+
     ASSERT(atomic_read(&d->shr_pages) == 0);
     p2m->phys_table = pagetable_null();
 
@@ -1710,6 +1720,83 @@ out_p2m_audit:
 #endif /* P2M_AUDIT */
 
 /*
+ * Add frames from foreign domain to target domain's physmap. Similar to
+ * XENMAPSPACE_gmfn but the frame is foreign being mapped into current,
+ * and is not removed from foreign domain.
+ * Usage: libxl on pvh dom0 creating a guest and doing privcmd_ioctl_mmap.
+ * Side Effect: the mfn for fgfn will be refcounted in lower level routines
+ *              so it is not lost while mapped here. The refcnt is released
+ *              via the XENMEM_remove_from_physmap path.
+ * Returns: 0 ==> success
+ */
+int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
+                    unsigned long gpfn, domid_t fdid)
+{
+    int rc = -ESRCH;
+    p2m_type_t p2mt, p2mt_prev;
+    unsigned long prev_mfn, mfn;
+    struct page_info *page = NULL;
+    struct domain *fdom = NULL;
+
+    /* xentrace mapping pages from xen heap are owned by DOMID_XEN */
+    if ( (fdid == DOMID_XEN && (fdom = rcu_lock_domain(dom_xen)) == NULL) ||
+         (fdid != DOMID_XEN && (fdom = rcu_lock_domain_by_id(fdid)) == NULL) )
+        goto out;
+
+    if ( (rc = xsm_map_gmfn_foreign(XSM_TARGET, tdom, fdom)) )
+        goto out;
+
+    rc = -EINVAL;
+    if ( tdom == fdom || !tdom || !fdom || !is_pvh_domain(tdom) )
+        goto out;
+
+    /* following will take a refcnt on the mfn */
+    page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC);
+    if ( !page || !p2m_is_valid(p2mt) )
+    {
+        if ( page )
+            put_page(page);
+        goto out;
+    }
+    mfn = mfn_x(page_to_mfn(page));
+
+    /* Remove previously mapped page if it is present. */
+    prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
+    if ( mfn_valid(_mfn(prev_mfn)) )
+    {
+        if ( is_xen_heap_mfn(prev_mfn) )
+            /* Xen heap frames are simply unhooked from this phys slot */
+            guest_physmap_remove_page(tdom, gpfn, prev_mfn, 0);
+        else
+            /* Normal domain memory is freed, to avoid leaking memory. */
+            guest_remove_page(tdom, gpfn);
+    }
+    /*
+     * Create the new mapping. Can't use guest_physmap_add_page() because it
+     * will update the m2p table which will result in  mfn -> gpfn of dom0
+     * and not fgfn of domU.
+     */
+    if ( set_foreign_p2m_entry(tdom, gpfn, _mfn(mfn)) == 0 )
+        gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
+                 "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
+                 gpfn, mfn, fgfn, tdom->domain_id, fdom->domain_id);
+    else
+        rc = 0;
+
+    put_page(page);
+
+    /*
+     * We must do this put_gfn after set_foreign_p2m_entry so another cpu
+     * doesn't populate the gpfn before us.
+     */
+    put_gfn(tdom, gpfn);
+
+out:
+    if ( fdom )
+        rcu_unlock_domain(fdom);
+    return rc;
+}
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index ad2383e..9e691f2 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -512,6 +512,9 @@ p2m_type_t p2m_change_type(struct domain *d, unsigned long gfn,
 int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn);
 
+/* Add foreign mapping to the guest's p2m table. */
+int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
+                    unsigned long gpfn, domid_t foreign_domid);
 
 /* 
  * Populate-on-demand
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (4 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-24  9:31   ` Jan Beulich
  2014-03-22  1:39 ` [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range Mukesh Rathor
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

With pvh, a translated domain, get_pg_owner must allow foreign mappings.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/mm.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fcd3178..27ac807 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2810,12 +2810,6 @@ static struct domain *get_pg_owner(domid_t domid)
         goto out;
     }
 
-    if ( unlikely(paging_mode_translate(curr)) )
-    {
-        MEM_LOG("Cannot mix foreign mappings with translated domains");
-        goto out;
-    }
-
     switch ( domid )
     {
     case DOMID_IO:
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (5 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-24  9:34   ` Jan Beulich
  2014-03-22  1:39 ` [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c Mukesh Rathor
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel
  Cc: JBeulich, George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima

pvh doesn't use apic emulation, as a result vioapic_init is not called
and vioapic ptr in struct hvm_domain is not initialized. One path that
would access the ptr is hvm_hap_nested_page_fault -> handle_mmio ->
    hvmemul_do_io -> hvm_mmio_intercept -> vioapic_range

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/hvm/vioapic.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index d3c681b..8196268 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -238,8 +238,13 @@ static int vioapic_write(
 
 static int vioapic_range(struct vcpu *v, unsigned long addr)
 {
-    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
+    struct hvm_hw_vioapic *vioapic;
+
+    /* pvh uses event channel callback */
+    if ( is_pvh_vcpu(v) )
+        return 0;
 
+    vioapic = domain_vioapic(v->domain);
     return ((addr >= vioapic->base_address &&
              (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (6 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range Mukesh Rathor
@ 2014-03-22  1:39 ` Mukesh Rathor
  2014-03-24  9:35   ` Jan Beulich
  2014-03-24  8:57 ` [V8 PATCH 0/8] pvh dom0 Jan Beulich
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-22  1:39 UTC (permalink / raw)
  To: xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

A pvh dom0 is booted by adding dom0pvh to grub xen command line.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 docs/misc/pvh-readme.txt |  2 ++
 xen/arch/x86/setup.c     | 12 +++++++++---
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/docs/misc/pvh-readme.txt b/docs/misc/pvh-readme.txt
index 9fea137..c5b3de4 100644
--- a/docs/misc/pvh-readme.txt
+++ b/docs/misc/pvh-readme.txt
@@ -37,6 +37,8 @@ supported. Phase I patches are broken into three parts:
    - tools changes for creating a PVH guest
    - boot of 64bit dom0 in PVH mode.
 
+To boot 64bit dom0 in PVH mode, add dom0pvh to grub xen command line.
+
 Following fixme's exist in the code:
   - arch/x86/time.c: support more tsc modes.
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4dbf2b7..1283455 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus);
 static bool_t __initdata disable_smep;
 invbool_param("smep", disable_smep);
 
+/* Boot dom0 in pvh mode */
+bool_t __initdata opt_dom0pvh;
+boolean_param("dom0pvh", opt_dom0pvh);
+
 /* **** Linux config option: propagated to domain0. */
 /* "acpi=off":    Sisables both ACPI table parsing and interpreter. */
 /* "acpi=force":  Override the disable blacklist.                   */
@@ -541,7 +545,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 {
     char *memmap_type = NULL;
     char *cmdline, *kextra, *loader;
-    unsigned int initrdidx;
+    unsigned int initrdidx, domcr_flags = 0;
     multiboot_info_t *mbi = __va(mbi_p);
     module_t *mod = (module_t *)__va(mbi->mods_addr);
     unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
@@ -1337,8 +1341,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
     if ( !tboot_protect_mem_regions() )
         panic("Could not protect TXT memory regions");
 
-    /* Create initial domain 0. */
-    dom0 = domain_create(0, DOMCRF_s3_integrity, 0);
+     /* Create initial domain 0. */
+     domcr_flags = (opt_dom0pvh ? DOMCRF_pvh | DOMCRF_hap : 0);
+     domcr_flags |= DOMCRF_s3_integrity;
+     dom0 = domain_create(0, domcr_flags, 0);
     if ( IS_ERR(dom0) || (alloc_dom0_vcpu0() == NULL) )
         panic("Error creating domain 0");
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (7 preceding siblings ...)
  2014-03-22  1:39 ` [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c Mukesh Rathor
@ 2014-03-24  8:57 ` Jan Beulich
  2014-03-24 21:36   ` Mukesh Rathor
  2014-03-28 17:36 ` Roger Pau Monné
  2014-04-01 16:04 ` George Dunlap
  10 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  8:57 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: George.Dunlap, xen-devel, keir.xen, tim

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> Patch 2: comment changes (based on feedback from Konrad):
> 
>   -            /* Unused RAM areas are marked UNUSABLE, so skip it too */
>   +            /* Unused RAM areas are marked UNUSABLE, so skip them too */
> 
> -    /* If the e820 ended under 4GB, we must map the remaining space upto 4GB */
> +    /* 
> +     * Some BIOSes may not report io space above ram that is less than 4GB. So
> +     * we map any non-ram upto 4GB.
> +     */
> 
> Jan, I retained Reviewed-by. If above voids the "reviewed-by", I'll just undo
> the comment changes. Please lmk if that's the case.

Definitely not - you'd have to make rather extensive changes to
comments to void a Reviewed-by (or a patch would have to consist
mostly of comments or changes thereto).

> Caveat:
> 
> With these patches, an HVM guest during boot intermittantly takes
> unresolved fault in load_module() path. This for PVH dom0 and not PV, and
> latest upstream kernel. I was not able to reproduce with older stock 
> fedora 19 kernel. So, it appears the problem might be on the guest side, 
> and not xen. Hence I decided to go ahead and submit these while I debug 
> that.

Without proper understanding of this I don't see a way for us to apply
the series, and it should therefore have been tagged as RFC. Just to
avoid any misunderstanding - even if the root cause was on the guest
side, if without your patches the issue never surfaces, this would still
need to be treated as a regression, and a way would need to be found
to make those guests work flawlessly again. Furthermore, with the
issue being in HVM guests, it is rather unlikely (but admittedly not entirely
impossible) that the issue is in the guest, as similar problems aren't - aiui-
known from Linux running on native hardware.

Apart from that, assuming you spent quite some time in isolating the
problem - did you determine which of the patches in this series
introduces it? Knowing that might help in determining possible reasons
(including having others help you with that).

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign
  2014-03-22  1:39 ` [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign Mukesh Rathor
@ 2014-03-24  9:00   ` Jan Beulich
  2014-03-27 12:29   ` George Dunlap
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  9:00 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: George.Dunlap, xen-devel, keir.xen, tim

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> In this patch, a new type p2m_map_foreign is introduced for pages
> that toolstack on an auto translated dom0 or a control domain maps
> from foreign domains that its creating or supporting during it's
> run time.

I think you need to swap "its" and "it's".

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-03-22  1:39 ` [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages Mukesh Rathor
@ 2014-03-24  9:26   ` Jan Beulich
  2014-04-05  1:17     ` Mukesh Rathor
  2014-04-11  1:33     ` Mukesh Rathor
  0 siblings, 2 replies; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  9:26 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
> +                                          const ept_entry_t *new)
> +{
> +    unsigned long oldmfn = 0;
> +
> +    if ( p2m_is_foreign(new->sa_p2mt) )
> +    {
> +        struct page_info *page;
> +        struct domain *fdom;
> +
> +        ASSERT(mfn_valid(new->mfn));
> +        page = mfn_to_page(new->mfn);
> +        fdom = page_get_owner(page);
> +        get_page(page, fdom);

I'm afraid the checking here is too weak, or at least inconsistent
(considering the ASSERT()): mfn_valid() isn't a sufficient condition
for page_get_owner() to return non-NULL or get_page() to
succeed. If all callers are supposed to guarantee this, then further
ASSERT()s should be added here. If not, error checking is going to
be necessary (and possibly the ASSERT() then also needs to be
converted to an error check).

I also wonder whether page_get_owner_and_reference() wouldn't
be more appropriate to be used here.

> @@ -275,14 +276,19 @@ struct page_info *get_page_from_gfn_p2m(
>          /* Fast path: look up and get out */
>          p2m_read_lock(p2m);
>          mfn = __get_gfn_type_access(p2m, gfn, t, a, 0, NULL, 0);
> -        if ( (p2m_is_ram(*t) || p2m_is_grant(*t))
> +        if ( (p2m_is_ram(*t) || p2m_is_grant(*t) || p2m_is_foreign(*t) )

Would this perhaps better become something like p2m_is_any_ram()?
(I'm in fact surprised this is only needed in exactly one place.)

> +int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
> +                    unsigned long gpfn, domid_t fdid)
> +{
> +    int rc = -ESRCH;
> +    p2m_type_t p2mt, p2mt_prev;
> +    unsigned long prev_mfn, mfn;
> +    struct page_info *page = NULL;
> +    struct domain *fdom = NULL;
> +
> +    /* xentrace mapping pages from xen heap are owned by DOMID_XEN */
> +    if ( (fdid == DOMID_XEN && (fdom = rcu_lock_domain(dom_xen)) == NULL) ||
> +         (fdid != DOMID_XEN && (fdom = rcu_lock_domain_by_id(fdid)) == NULL) )
> +        goto out;

Didn't you mean to call get_pg_owner() here, at once taking care of
DOMID_IO too? Also I don't think the reference to xentrace is really
useful here - DOMID_XEN owned pages certainly exist elsewhere too.

Of course the question is whether for the purposes you have here
DOMID_XEN/DOMID_IO owned pages are actually validly making it
here.

> +
> +    if ( (rc = xsm_map_gmfn_foreign(XSM_TARGET, tdom, fdom)) )
> +        goto out;
> +
> +    rc = -EINVAL;
> +    if ( tdom == fdom || !tdom || !fdom || !is_pvh_domain(tdom) )
> +        goto out;
> +
> +    /* following will take a refcnt on the mfn */
> +    page = get_page_from_gfn(fdom, fgfn, &p2mt, P2M_ALLOC);
> +    if ( !page || !p2m_is_valid(p2mt) )
> +    {
> +        if ( page )
> +            put_page(page);
> +        goto out;
> +    }
> +    mfn = mfn_x(page_to_mfn(page));
> +
> +    /* Remove previously mapped page if it is present. */
> +    prev_mfn = mfn_x(get_gfn(tdom, gpfn, &p2mt_prev));
> +    if ( mfn_valid(_mfn(prev_mfn)) )
> +    {
> +        if ( is_xen_heap_mfn(prev_mfn) )
> +            /* Xen heap frames are simply unhooked from this phys slot */
> +            guest_physmap_remove_page(tdom, gpfn, prev_mfn, 0);
> +        else
> +            /* Normal domain memory is freed, to avoid leaking memory. */
> +            guest_remove_page(tdom, gpfn);
> +    }
> +    /*
> +     * Create the new mapping. Can't use guest_physmap_add_page() because it
> +     * will update the m2p table which will result in  mfn -> gpfn of dom0
> +     * and not fgfn of domU.
> +     */
> +    if ( set_foreign_p2m_entry(tdom, gpfn, _mfn(mfn)) == 0 )
> +        gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
> +                 "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
> +                 gpfn, mfn, fgfn, tdom->domain_id, fdom->domain_id);
> +    else
> +        rc = 0;

Can't you make set_foreign_p2m_entry() return a proper error code
(if it doesn't already) and use that here instead of the relatively
meaningless -EINVAL inherited from above (I suppose the function
could e.g. also return -ENOMEM)?

> +
> +    put_page(page);
> +
> +    /*
> +     * We must do this put_gfn after set_foreign_p2m_entry so another cpu
> +     * doesn't populate the gpfn before us.
> +     */
> +    put_gfn(tdom, gpfn);

Might be worth mentioning in the comment which get_...() this pairs
with, as that's not obvious from the code elsewhere in the function
(or the patch as a whole).

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains
  2014-03-22  1:39 ` [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains Mukesh Rathor
@ 2014-03-24  9:31   ` Jan Beulich
  2014-04-01 14:31     ` George Dunlap
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  9:31 UTC (permalink / raw)
  To: Mukesh Rathor; +Cc: George.Dunlap, xen-devel, keir.xen, tim

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> With pvh, a translated domain, get_pg_owner must allow foreign mappings.

And what about HVM domains? Can you really safely drop this namely
with the possible use of this function in an earlier patch of this series?
(Without that new use, the change of course is safe assuming that
PV domains are never translated; I'm not sure however whether this
is being explicitly guaranteed anywhere these days, or just "a fact").

Jan

> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> ---
>  xen/arch/x86/mm.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index fcd3178..27ac807 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2810,12 +2810,6 @@ static struct domain *get_pg_owner(domid_t domid)
>          goto out;
>      }
>  
> -    if ( unlikely(paging_mode_translate(curr)) )
> -    {
> -        MEM_LOG("Cannot mix foreign mappings with translated domains");
> -        goto out;
> -    }
> -
>      switch ( domid )
>      {
>      case DOMID_IO:
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-03-22  1:39 ` [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range Mukesh Rathor
@ 2014-03-24  9:34   ` Jan Beulich
  2014-04-01 14:40     ` George Dunlap
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  9:34 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -238,8 +238,13 @@ static int vioapic_write(
>  
>  static int vioapic_range(struct vcpu *v, unsigned long addr)
>  {
> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
> +    struct hvm_hw_vioapic *vioapic;
> +
> +    /* pvh uses event channel callback */
> +    if ( is_pvh_vcpu(v) )
> +        return 0;
>  
> +    vioapic = domain_vioapic(v->domain);

I can see why the extra check is needed, but I can't see why you
convert the initializer to an assignment: Afaict domain_vioapic() is
safe even if d->arch.hvm_domain.vioapic == NULL.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c
  2014-03-22  1:39 ` [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c Mukesh Rathor
@ 2014-03-24  9:35   ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2014-03-24  9:35 UTC (permalink / raw)
  To: xen-devel, Mukesh Rathor; +Cc: George.Dunlap, keir.xen, tim

>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -61,6 +61,10 @@ integer_param("maxcpus", max_cpus);
>  static bool_t __initdata disable_smep;
>  invbool_param("smep", disable_smep);
>  
> +/* Boot dom0 in pvh mode */
> +bool_t __initdata opt_dom0pvh;

static

Jan

> +boolean_param("dom0pvh", opt_dom0pvh);
> +
>  /* **** Linux config option: propagated to domain0. */
>  /* "acpi=off":    Sisables both ACPI table parsing and interpreter. */
>  /* "acpi=force":  Override the disable blacklist.                   */
> @@ -541,7 +545,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  {
>      char *memmap_type = NULL;
>      char *cmdline, *kextra, *loader;
> -    unsigned int initrdidx;
> +    unsigned int initrdidx, domcr_flags = 0;
>      multiboot_info_t *mbi = __va(mbi_p);
>      module_t *mod = (module_t *)__va(mbi->mods_addr);
>      unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
> @@ -1337,8 +1341,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>      if ( !tboot_protect_mem_regions() )
>          panic("Could not protect TXT memory regions");
>  
> -    /* Create initial domain 0. */
> -    dom0 = domain_create(0, DOMCRF_s3_integrity, 0);
> +     /* Create initial domain 0. */
> +     domcr_flags = (opt_dom0pvh ? DOMCRF_pvh | DOMCRF_hap : 0);
> +     domcr_flags |= DOMCRF_s3_integrity;
> +     dom0 = domain_create(0, domcr_flags, 0);
>      if ( IS_ERR(dom0) || (alloc_dom0_vcpu0() == NULL) )
>          panic("Error creating domain 0");
>  
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-03-24  8:57 ` [V8 PATCH 0/8] pvh dom0 Jan Beulich
@ 2014-03-24 21:36   ` Mukesh Rathor
  0 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-24 21:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George.Dunlap, xen-devel, keir.xen, tim

On Mon, 24 Mar 2014 08:57:12 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
....... 
> > Caveat:
> > 
> > With these patches, an HVM guest during boot intermittantly takes
> > unresolved fault in load_module() path. This for PVH dom0 and not <===
> > PV, and latest upstream kernel. I was not able to reproduce
> > with older stock fedora 19 kernel. So, it appears the problem might
> > be on the guest side, and not xen. Hence I decided to go ahead and
> > submit these while I debug that.
> 
> Without proper understanding of this I don't see a way for us to apply
> the series, and it should therefore have been tagged as RFC. Just to
> avoid any misunderstanding - even if the root cause was on the guest
> side, if without your patches the issue never surfaces, this would
> still need to be treated as a regression, and a way would need to be

Well, the problem doesn't happen with PV dom0, only PVH dom0, so not 
quite a regression IMO :).

> found to make those guests work flawlessly again. Furthermore, with
> the issue being in HVM guests, it is rather unlikely (but admittedly
> not entirely impossible) that the issue is in the guest, as similar
> problems aren't - aiui- known from Linux running on native hardware.
> 
> Apart from that, assuming you spent quite some time in isolating the
> problem - did you determine which of the patches in this series
> introduces it? Knowing that might help in determining possible reasons

No, since the problem only occurs on PVH dom0 and not PV dom0, I need 
all patches to boot dom0 in PVH mode and be able to create guests on it.

Will continue debug it.

thanks a lot for your time,
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86
  2014-03-22  1:39 ` [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86 Mukesh Rathor
@ 2014-03-25 17:53   ` Daniel De Graaf
  0 siblings, 0 replies; 51+ messages in thread
From: Daniel De Graaf @ 2014-03-25 17:53 UTC (permalink / raw)
  To: Mukesh Rathor, xen-devel
  Cc: George.Dunlap, keir.xen, tim, JBeulich, julien.grall

On 03/21/2014 09:39 PM, Mukesh Rathor wrote:
> In this patch we make xsm_map_gmfn_foreign available for x86 also. This
> is used in the next patch "pvh dom0: Add and remove foreign pages" in
> function p2m_add_foreign.
>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>

I think the definition needs to be moved above the #ifdef CONFIG_X86
group now that it is common to X86 and ARM, so that the functions in
xsm_operations remain organized with common functions preceding the
architecture-specific functions.  Otherwise, adding ARM-only operations
in the future will need to move this function to leave a sane-looking
structure.

> ---
>   xen/include/xsm/dummy.h | 2 --
>   xen/include/xsm/xsm.h   | 4 ----
>   xen/xsm/dummy.c         | 2 --
>   xen/xsm/flask/hooks.c   | 4 ----
>   4 files changed, 12 deletions(-)
>
> diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
> index 3bcd941..eb7aeb8 100644
> --- a/xen/include/xsm/dummy.h
> +++ b/xen/include/xsm/dummy.h
> @@ -622,10 +622,8 @@ static XSM_INLINE int xsm_ioport_mapping(XSM_DEFAULT_ARG struct domain *d, uint3
>
>   #endif /* CONFIG_X86 */
>
> -#ifdef CONFIG_ARM
>   static XSM_INLINE int xsm_map_gmfn_foreign(XSM_DEFAULT_ARG struct domain *d, struct domain *t)
>   {
>       XSM_ASSERT_ACTION(XSM_TARGET);
>       return xsm_default_action(action, d, t);
>   }
> -#endif
> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
> index de9cf86..6281e29 100644
> --- a/xen/include/xsm/xsm.h
> +++ b/xen/include/xsm/xsm.h
> @@ -164,9 +164,7 @@ struct xsm_operations {
>       int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>       int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow);
>   #endif
> -#ifdef CONFIG_ARM
>       int (*map_gmfn_foreign) (struct domain *d, struct domain *t);
> -#endif
>   };
>
>   #ifdef XSM_ENABLE
> @@ -630,12 +628,10 @@ static inline int xsm_ioport_mapping (xsm_default_t def, struct domain *d, uint3
>   }
>   #endif /* CONFIG_X86 */
>
> -#ifdef CONFIG_ARM
>   static inline int xsm_map_gmfn_foreign (xsm_default_t def, struct domain *d, struct domain *t)
>   {
>       return xsm_ops->map_gmfn_foreign(d, t);
>   }
> -#endif /* CONFIG_ARM */
>
>   #endif /* XSM_NO_WRAPPERS */
>
> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c
> index 3fe4c59..0234490 100644
> --- a/xen/xsm/dummy.c
> +++ b/xen/xsm/dummy.c
> @@ -134,7 +134,5 @@ void xsm_fixup_ops (struct xsm_operations *ops)
>       set_to_dummy_if_null(ops, ioport_permission);
>       set_to_dummy_if_null(ops, ioport_mapping);
>   #endif
> -#ifdef CONFIG_ARM
>       set_to_dummy_if_null(ops, map_gmfn_foreign);
> -#endif
>   }
> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
> index 96276ac..df92953 100644
> --- a/xen/xsm/flask/hooks.c
> +++ b/xen/xsm/flask/hooks.c
> @@ -1450,12 +1450,10 @@ static int flask_unbind_pt_irq (struct domain *d, struct xen_domctl_bind_pt_irq
>   }
>   #endif /* CONFIG_X86 */
>
> -#ifdef CONFIG_ARM
>   static int flask_map_gmfn_foreign(struct domain *d, struct domain *t)
>   {
>       return domain_has_perm(d, t, SECCLASS_MMU, MMU__MAP_READ | MMU__MAP_WRITE);
>   }
> -#endif
>
>   long do_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
>   int compat_flask_op(XEN_GUEST_HANDLE_PARAM(xsm_op_t) u_flask_op);
> @@ -1566,9 +1564,7 @@ static struct xsm_operations flask_ops = {
>       .ioport_permission = flask_ioport_permission,
>       .ioport_mapping = flask_ioport_mapping,
>   #endif
> -#ifdef CONFIG_ARM
>       .map_gmfn_foreign = flask_map_gmfn_foreign,
> -#endif
>   };
>
>   static __init int flask_init(void)
>


-- 
Daniel De Graaf
National Security Agency

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-22  1:39 ` [V8 PATCH 2/8] pvh dom0: construct_dom0 changes Mukesh Rathor
@ 2014-03-26 19:05   ` George Dunlap
  2014-03-27 10:14     ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-03-26 19:05 UTC (permalink / raw)
  To: Mukesh Rathor, xen-devel; +Cc: keir.xen, tim, JBeulich

On 03/22/2014 01:39 AM, Mukesh Rathor wrote:
> This patch changes construct_dom0() to boot in PVH mode.

[Apologies in advance for not following the previous review cycles.]

This needs a better description.  Minimally:

* Call guest_physmap_add_range rather than simple physmap setting for 
pvh guests.
* Map all non-RAM regions for dom0 as 1:1 up to 4GiB, so dom0 has IO 
memory mapped.
* If pvh was specified, make sure that dom0 supports pvh mode
* Allocate p2m pages, copying calculation from toolstack
* Make space for the shared info in the p2m (?)
* [A description of what "fixup_page_tables_for_hap" does]

> +    /*
> +     * pvh: we temporarily disable paging mode so that we can build cr3 needed
> +     * to run on dom0's page tables.
> +     */
> +    save_pvh_pg_mode = d->arch.paging.mode;
> +    d->arch.paging.mode = 0;
> +

What on earth is this about?  Setting d->arch.paging.mode to zero isn't 
actually disabling paging; it's just making people *thing* paging is 
disabled.  AFAICT paging should be enabled at this point anyway, should it?

  
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index b8c5422..878697e 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
      }
  }
  
+void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
+                                       unsigned long num_pages)
+{
+    int rc;
+    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
+
+    /* Copied from: libxl_get_required_shadow_memory() */
+    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
+    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
+    paging_lock(d);
+    rc = hap_set_allocation(d, num_pages, NULL);
+    paging_unlock(d);


The calculation for how many pages of shadow memory are needed logically 
belongs in domain_build.c, not hap.c.

  -George

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-26 19:05   ` George Dunlap
@ 2014-03-27 10:14     ` Jan Beulich
  2014-03-27 10:55       ` George Dunlap
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-03-27 10:14 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, keir.xen, tim

>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
>       }
>   }
>   
> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
> +                                       unsigned long num_pages)
> +{
> +    int rc;
> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
> +
> +    /* Copied from: libxl_get_required_shadow_memory() */
> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
> +    paging_lock(d);
> +    rc = hap_set_allocation(d, num_pages, NULL);
> +    paging_unlock(d);
> 
> 
> The calculation for how many pages of shadow memory are needed logically 
> belongs in domain_build.c, not hap.c.

Iirc it was me requesting it to be moved here, on the basis that the
calculation should match the one for other domains, and hence be
done alongside that for other domains. domain_build.c shouldn't
carry HAP-specific knowledge imo.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-27 10:14     ` Jan Beulich
@ 2014-03-27 10:55       ` George Dunlap
  2014-03-27 11:03         ` George Dunlap
  2014-03-27 15:04         ` Jan Beulich
  0 siblings, 2 replies; 51+ messages in thread
From: George Dunlap @ 2014-03-27 10:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir.xen, tim

On 03/27/2014 10:14 AM, Jan Beulich wrote:
>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
>> --- a/xen/arch/x86/mm/hap/hap.c
>> +++ b/xen/arch/x86/mm/hap/hap.c
>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t *sc,
>>        }
>>    }
>>    
>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
>> +                                       unsigned long num_pages)
>> +{
>> +    int rc;
>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
>> +
>> +    /* Copied from: libxl_get_required_shadow_memory() */
>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
>> +    paging_lock(d);
>> +    rc = hap_set_allocation(d, num_pages, NULL);
>> +    paging_unlock(d);
>>
>>
>> The calculation for how many pages of shadow memory are needed logically
>> belongs in domain_build.c, not hap.c.
> Iirc it was me requesting it to be moved here, on the basis that the
> calculation should match the one for other domains, and hence be
> done alongside that for other domains. domain_build.c shouldn't
> carry HAP-specific knowledge imo.

I thought that might be the case, which is why I apologized in advance 
for not reading the previous thread.

The calculation should indeed match that done for other domains; 
however, as the comment clearly says, this calculation is done in libxl, 
not in Xen at all.  Everything done to build dom0 in Xen which 
corresponds to things done by the toolstack to build a domU logically 
belongs in domain_build.c.

Putting it in hap.c would be wrong in any case; what would you do when 
we enable shadow mode for HAP dom0?

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-27 10:55       ` George Dunlap
@ 2014-03-27 11:03         ` George Dunlap
  2014-03-27 15:04         ` Jan Beulich
  1 sibling, 0 replies; 51+ messages in thread
From: George Dunlap @ 2014-03-27 11:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, keir.xen, Tim Deegan

On Thu, Mar 27, 2014 at 10:55 AM, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> Putting it in hap.c would be wrong in any case; what would you do when we
> enable shadow mode for HAP dom0?

That should obviously be "shadow mode for a PVH dom0"....

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign
  2014-03-22  1:39 ` [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign Mukesh Rathor
  2014-03-24  9:00   ` Jan Beulich
@ 2014-03-27 12:29   ` George Dunlap
  2014-04-05  0:57     ` Mukesh Rathor
  1 sibling, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-03-27 12:29 UTC (permalink / raw)
  To: Mukesh Rathor, xen-devel; +Cc: keir.xen, tim, JBeulich

On 03/22/2014 01:39 AM, Mukesh Rathor wrote:
> In this patch, a new type p2m_map_foreign is introduced for pages
> that toolstack on an auto translated dom0 or a control domain maps
> from foreign domains that its creating or supporting during it's
> run time.
>
> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
> Acked-by: Tim Deegan <tim@xen.org>

Overall looks good, just a couple of comments:

> diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
> index c0ddef0..7050f6a 100644
> --- a/xen/arch/x86/mm/p2m.c
> +++ b/xen/arch/x86/mm/p2m.c
> @@ -492,7 +492,7 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long gfn, unsigned long mfn,
>           for ( i = 0; i < (1UL << page_order); i++ )
>           {
>               mfn_return = p2m->get_entry(p2m, gfn + i, &t, &a, 0, NULL);
> -            if ( !p2m_is_grant(t) && !p2m_is_shared(t) )
> +            if ( !p2m_is_grant(t) && !p2m_is_shared(t) && !p2m_is_foreign(t) )

What's the unifying charateristic of these three types?  At some point 
we should come up with a name for it, and make a function like 
"p2m_is_ram()", which returns true for all three of them.  (Not required 
for acceptance.)

> @@ -751,16 +750,31 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>           set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
>       }
>   
> -    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
> -    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access);
> +    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
> +    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
> +                       p2m->default_access);
>       gfn_unlock(p2m, gfn, 0);
>       if ( 0 == rc )
>           gdprintk(XENLOG_ERR,
> -            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
> +            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
>               mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
>       return rc;
>   }
>   
> +/* Set foreign mfn in the given guest's p2m table.
> + * Returns: True for success. */
> +static int __attribute__((unused))
> +set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)

Why "unused"?  It appears that tag remains even at the end of the series.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-27 10:55       ` George Dunlap
  2014-03-27 11:03         ` George Dunlap
@ 2014-03-27 15:04         ` Jan Beulich
  2014-03-27 15:30           ` Tim Deegan
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-03-27 15:04 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, keir.xen, tim

>>> On 27.03.14 at 11:55, <george.dunlap@eu.citrix.com> wrote:
> On 03/27/2014 10:14 AM, Jan Beulich wrote:
>>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t 
> *sc,
>>>        }
>>>    }
>>>    
>>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
>>> +                                       unsigned long num_pages)
>>> +{
>>> +    int rc;
>>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
>>> +
>>> +    /* Copied from: libxl_get_required_shadow_memory() */
>>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
>>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
>>> +    paging_lock(d);
>>> +    rc = hap_set_allocation(d, num_pages, NULL);
>>> +    paging_unlock(d);
>>>
>>>
>>> The calculation for how many pages of shadow memory are needed logically
>>> belongs in domain_build.c, not hap.c.
>> Iirc it was me requesting it to be moved here, on the basis that the
>> calculation should match the one for other domains, and hence be
>> done alongside that for other domains. domain_build.c shouldn't
>> carry HAP-specific knowledge imo.
> 
> I thought that might be the case, which is why I apologized in advance 
> for not reading the previous thread.
> 
> The calculation should indeed match that done for other domains; 
> however, as the comment clearly says, this calculation is done in libxl, 
> not in Xen at all.  Everything done to build dom0 in Xen which 
> corresponds to things done by the toolstack to build a domU logically 
> belongs in domain_build.c.
> 
> Putting it in hap.c would be wrong in any case; what would you do when 
> we enable shadow mode for HAP dom0?

The function calls hap_set_allocation(), so it's already HAP-specific.
I guess that was the main motivation to ask for it to be put there
(including the fact that this function is presently static, and should
preferably remain so).

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-27 15:04         ` Jan Beulich
@ 2014-03-27 15:30           ` Tim Deegan
  2014-04-05  0:53             ` Mukesh Rathor
  0 siblings, 1 reply; 51+ messages in thread
From: Tim Deegan @ 2014-03-27 15:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, keir.xen

At 15:04 +0000 on 27 Mar (1395929085), Jan Beulich wrote:
> >>> On 27.03.14 at 11:55, <george.dunlap@eu.citrix.com> wrote:
> > On 03/27/2014 10:14 AM, Jan Beulich wrote:
> >>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d, xen_domctl_shadow_op_t 
> > *sc,
> >>>        }
> >>>    }
> >>>    
> >>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
> >>> +                                       unsigned long num_pages)
> >>> +{
> >>> +    int rc;
> >>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
> >>> +
> >>> +    /* Copied from: libxl_get_required_shadow_memory() */
> >>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
> >>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
> >>> +    paging_lock(d);
> >>> +    rc = hap_set_allocation(d, num_pages, NULL);
> >>> +    paging_unlock(d);
> >>>
> >>>
> >>> The calculation for how many pages of shadow memory are needed logically
> >>> belongs in domain_build.c, not hap.c.
> >> Iirc it was me requesting it to be moved here, on the basis that the
> >> calculation should match the one for other domains, and hence be
> >> done alongside that for other domains. domain_build.c shouldn't
> >> carry HAP-specific knowledge imo.
> > 
> > I thought that might be the case, which is why I apologized in advance 
> > for not reading the previous thread.
> > 
> > The calculation should indeed match that done for other domains; 
> > however, as the comment clearly says, this calculation is done in libxl, 
> > not in Xen at all.  Everything done to build dom0 in Xen which 
> > corresponds to things done by the toolstack to build a domU logically 
> > belongs in domain_build.c.
> > 
> > Putting it in hap.c would be wrong in any case; what would you do when 
> > we enable shadow mode for HAP dom0?
> 
> The function calls hap_set_allocation(), so it's already HAP-specific.

It really ought to be calling a paging_set_allocation() wrapper;
there's nothing _inherently_ HAP-specific about PVH.  Right now that
wrapper doesn't exist, because that path goes via paging_domctl() for
other domains.

I'd be happy with the allocation runes being in x86/mm/paging.c or in
domain_build.c, with a mild preferecen for the latter.

Tim.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (8 preceding siblings ...)
  2014-03-24  8:57 ` [V8 PATCH 0/8] pvh dom0 Jan Beulich
@ 2014-03-28 17:36 ` Roger Pau Monné
  2014-03-28 19:48   ` Mukesh Rathor
  2014-04-01 16:04 ` George Dunlap
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2014-03-28 17:36 UTC (permalink / raw)
  To: Mukesh Rathor, xen-devel; +Cc: George.Dunlap, keir.xen, tim, JBeulich

On 22/03/14 02:39, Mukesh Rathor wrote:
> Hi all,
> 
> Finally, please find V8 of dom0 PVH patches based on commit bc69aaf.
> 
>   git tree: git://oss.oracle.com/git/mrathor/xen.git  branch: dom0pvh-v8

Hello Mukesh,

Thanks for the patches, do you have the Linux side of them? (I think 
the only missing bit is the support for XENMEM_add_to_physmap_range).

Also, while testing them I've found that from time to time I would hit 
the following ASSERT on shutdown:

(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000080050033   rbx: ffff8300dfb1c000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: ffff82d0802d7fc0   rdi: ffff8300dfb1c000
(XEN) rbp: ffff82d0802d7a88   rsp: ffff82d0802d7a78   r8:  0000000000000000
(XEN) r9:  ffff82cffffff000   r10: 0000000b06dca869   r11: 0000003d7d708160
(XEN) r12: ffff8300dfb1c000   r13: 0000000000000000   r14: ffff82d0802d0000
(XEN) r15: ffff82d0802d7f18   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 000000019ed8d000   cr2: 0000000800dcb040
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82d0802d7a78:
(XEN)    ffff8300dfdf2000 ffff8300dfdf2000 ffff82d0802d7ad8 ffff82d08015d129
(XEN)    fffffe003d272a38 ffff83019a3f9140 0000000000000000 0000000000000001
(XEN)    0000000000000086 000000000019a400 0000000000000000 0001000000010030
(XEN)    ffff82d0802d7af8 ffff82d080160acf ffff83019ec18530 ffff8300dfdf2000
(XEN)    ffff82d0802d7b08 ffff82d080160af9 ffff82d0802d7b58 ffff82d080161728
(XEN)    ffff82d0802d7b48 ffff82d08013ffbf 0000000000000002 ffff83019ec18530
(XEN)    0000000000000000 0000000000000012 0000000000000000 0001000000010030
(XEN)    ffff82d0802d7b68 ffff82d08014e721 ffff82d0802d7bc8 ffff82d08014cda2
(XEN)    ffff8300dfb1c000 0000000000000092 ffff83019ec18604 ffff83019ec185f8
(XEN)    ffff82d0802d0000 0000000000000001 0000000000000000 ffff82d08016560e
(XEN)    ffff82d080272860 0000000000000020 ffff82d0802d7bd8 ffff82d0801448a8
(XEN)    ffff82d0802d7be8 ffff82d080165625 ffff82d0802d7c18 ffff82d080166143
(XEN)    0000000000000000 0000000000000001 0000000000000000 ffff82d080272860
(XEN)    ffff82d0802d7c48 ffff82d080166aa8 ffff8300dfb1c060 0000000000010000
(XEN)    0000000000000001 ffff82d080272860 ffff82d0802d7c78 ffff82d080166bae
(XEN)    000000000000000a ffff82d080276fe0 00000000000000fb 0000000000000061
(XEN)    ffff82d0802d7c98 ffff82d080166f63 ffff82d0802d7c98 ffff82d0801821ff
(XEN)    ffff82d0802d7cb8 ffff82d08018228b 0000000000000000 ffff82d0802d7dd8
(XEN)    ffff82d0802d7cf8 ffff82d080181aa7 ffff82d0802d7d08 0000000000000206
(XEN)    0000000000000000 ffff82d0802d7dd8 00000000000000fb 0000000000000008
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
(XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
(XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
(XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
(XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
(XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
(XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
(XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
(XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
(XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
(XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
(XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
(XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
(XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
(XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
(XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
(XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
(XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
(XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
(XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

So I've added the following patch on top of yours, which seems to solve 
the issue (feel free to integrate it into your series if you think it's 
correct):

---
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 0433f30..0226d04 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -295,6 +295,7 @@ void __stop_this_cpu(void)
      */
     clts();
     asm volatile ( "fninit" );
+    stts();
 
     cpumask_clear_cpu(smp_processor_id(), &cpu_online_map);
 }

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-03-28 17:36 ` Roger Pau Monné
@ 2014-03-28 19:48   ` Mukesh Rathor
  0 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-03-28 19:48 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George.Dunlap, xen-devel, keir.xen, tim, JBeulich

[-- Attachment #1: Type: text/plain, Size: 4406 bytes --]

On Fri, 28 Mar 2014 18:36:07 +0100
Roger Pau Monné <roger.pau@citrix.com> wrote:

> On 22/03/14 02:39, Mukesh Rathor wrote:
> > Hi all,
> > 
> > Finally, please find V8 of dom0 PVH patches based on commit bc69aaf.
> > 
> >   git tree: git://oss.oracle.com/git/mrathor/xen.git  branch:
> > dom0pvh-v8
> 
> Hello Mukesh,
> 
> Thanks for the patches, do you have the Linux side of them? (I think 
> the only missing bit is the support for XENMEM_add_to_physmap_range).

Attached and linlined below what I've in my tree...

> Also, while testing them I've found that from time to time I would
> hit the following ASSERT on shutdown:

Ok, I'll add that to my series.

thanks Roger,
Mukesh



diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 256282e..9063d0a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2511,6 +2511,91 @@ void __init xen_hvm_init_mmu_ops(void)
 }
 #endif
 
+/* 
+ * Map foreign gmfn, fgmfn, to local pfn, lpfn. This for the user space
+ * creating new guest on pvh dom0 and needing to map domU pages.
+ */
+static int autox_add_to_p2m(unsigned long lpfn, unsigned long fgmfn,
+			    unsigned int domid)
+{
+	int rc, err = 0;
+	xen_pfn_t gpfn = lpfn;
+	xen_ulong_t idx = fgmfn;
+
+	struct xen_add_to_physmap_range xatp = {
+		.domid = DOMID_SELF,
+		.foreign_domid = domid,
+		.size = 1,
+		.space = XENMAPSPACE_gmfn_foreign,
+	};
+	set_xen_guest_handle(xatp.idxs, &idx);
+	set_xen_guest_handle(xatp.gpfns, &gpfn);
+	set_xen_guest_handle(xatp.errs, &err);
+
+	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+	return rc;
+}
+
+static int autox_remove_from_p2m(unsigned long spfn, int count)
+{
+	struct xen_remove_from_physmap xrp;
+	int i, rc;
+
+	for (i = 0; i < count; i++) {
+		xrp.domid = DOMID_SELF;
+		xrp.gpfn = spfn+i;
+		rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+		if (rc)
+			break;
+	}
+	return rc;
+}
+
+struct autox_remap_data {
+	unsigned long fgmfn; /* foreign domain's gmfn */
+	pgprot_t prot;
+	domid_t  domid;
+	int index;
+	struct page **pages;
+};
+
+static int autox_map_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	int rc;
+	struct autox_remap_data *remap = data;
+	unsigned long pfn = page_to_pfn(remap->pages[remap->index++]);
+	pte_t pteval = pte_mkspecial(pfn_pte(pfn, remap->prot));
+
+	rc = autox_add_to_p2m(pfn, remap->fgmfn, remap->domid);
+	if (rc)
+		return rc;
+	native_set_pte(ptep, pteval);
+
+	return 0;
+}
+
+static int autox_remap_gmfn_range(struct vm_area_struct *vma,
+				  unsigned long addr, unsigned long mfn,
+				  int nr, pgprot_t prot, unsigned domid,
+				  struct page **pages)
+{
+	int err;
+	struct autox_remap_data pvhdata;
+
+	BUG_ON(!pages);
+
+	pvhdata.fgmfn = mfn;
+	pvhdata.prot = prot;
+	pvhdata.domid = domid;
+	pvhdata.index = 0;
+	pvhdata.pages = pages;
+	err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
+				  autox_map_pte_fn, &pvhdata);
+	flush_tlb_all();
+	return err;
+}
+
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
@@ -2545,13 +2630,16 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 	unsigned long range;
 	int err = 0;
 
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return -EINVAL;
-
 	prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
 
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* We need to update the local page tables and the xen HAP */
+		return autox_remap_gmfn_range(vma, addr, mfn, nr, prot,
+					      domid, pages);
+	}
+
 	rmd.mfn = mfn;
 	rmd.prot = prot;
 
@@ -2589,6 +2677,18 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
 	if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-	return -EINVAL;
+	while (numpgs--) {
+
+		/* The mmu has already cleaned up the process mmu resources at
+		 * this point (lookup_address will return NULL). */
+		unsigned long pfn = page_to_pfn(pages[numpgs]);
+
+		autox_remove_from_p2m(pfn, 1);
+	}
+	/* We don't need to flush tlbs because as part of autox_remove_from_p2m,
+	 * the hypervisor will do tlb flushes after removing the p2m entries
+	 * from the EPT/NPT */
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);

[-- Attachment #2: linux.patch --]
[-- Type: text/x-patch, Size: 3554 bytes --]

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 256282e..9063d0a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -2511,6 +2511,91 @@ void __init xen_hvm_init_mmu_ops(void)
 }
 #endif
 
+/* 
+ * Map foreign gmfn, fgmfn, to local pfn, lpfn. This for the user space
+ * creating new guest on pvh dom0 and needing to map domU pages.
+ */
+static int autox_add_to_p2m(unsigned long lpfn, unsigned long fgmfn,
+			    unsigned int domid)
+{
+	int rc, err = 0;
+	xen_pfn_t gpfn = lpfn;
+	xen_ulong_t idx = fgmfn;
+
+	struct xen_add_to_physmap_range xatp = {
+		.domid = DOMID_SELF,
+		.foreign_domid = domid,
+		.size = 1,
+		.space = XENMAPSPACE_gmfn_foreign,
+	};
+	set_xen_guest_handle(xatp.idxs, &idx);
+	set_xen_guest_handle(xatp.gpfns, &gpfn);
+	set_xen_guest_handle(xatp.errs, &err);
+
+	rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
+	return rc;
+}
+
+static int autox_remove_from_p2m(unsigned long spfn, int count)
+{
+	struct xen_remove_from_physmap xrp;
+	int i, rc;
+
+	for (i = 0; i < count; i++) {
+		xrp.domid = DOMID_SELF;
+		xrp.gpfn = spfn+i;
+		rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrp);
+		if (rc)
+			break;
+	}
+	return rc;
+}
+
+struct autox_remap_data {
+	unsigned long fgmfn; /* foreign domain's gmfn */
+	pgprot_t prot;
+	domid_t  domid;
+	int index;
+	struct page **pages;
+};
+
+static int autox_map_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
+			    void *data)
+{
+	int rc;
+	struct autox_remap_data *remap = data;
+	unsigned long pfn = page_to_pfn(remap->pages[remap->index++]);
+	pte_t pteval = pte_mkspecial(pfn_pte(pfn, remap->prot));
+
+	rc = autox_add_to_p2m(pfn, remap->fgmfn, remap->domid);
+	if (rc)
+		return rc;
+	native_set_pte(ptep, pteval);
+
+	return 0;
+}
+
+static int autox_remap_gmfn_range(struct vm_area_struct *vma,
+				  unsigned long addr, unsigned long mfn,
+				  int nr, pgprot_t prot, unsigned domid,
+				  struct page **pages)
+{
+	int err;
+	struct autox_remap_data pvhdata;
+
+	BUG_ON(!pages);
+
+	pvhdata.fgmfn = mfn;
+	pvhdata.prot = prot;
+	pvhdata.domid = domid;
+	pvhdata.index = 0;
+	pvhdata.pages = pages;
+	err = apply_to_page_range(vma->vm_mm, addr, nr << PAGE_SHIFT,
+				  autox_map_pte_fn, &pvhdata);
+	flush_tlb_all();
+	return err;
+}
+
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
@@ -2545,13 +2630,16 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
 	unsigned long range;
 	int err = 0;
 
-	if (xen_feature(XENFEAT_auto_translated_physmap))
-		return -EINVAL;
-
 	prot = __pgprot(pgprot_val(prot) | _PAGE_IOMAP);
 
 	BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
+	if (xen_feature(XENFEAT_auto_translated_physmap)) {
+		/* We need to update the local page tables and the xen HAP */
+		return autox_remap_gmfn_range(vma, addr, mfn, nr, prot,
+					      domid, pages);
+	}
+
 	rmd.mfn = mfn;
 	rmd.prot = prot;
 
@@ -2589,6 +2677,18 @@ int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
 	if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
 		return 0;
 
-	return -EINVAL;
+	while (numpgs--) {
+
+		/* The mmu has already cleaned up the process mmu resources at
+		 * this point (lookup_address will return NULL). */
+		unsigned long pfn = page_to_pfn(pages[numpgs]);
+
+		autox_remove_from_p2m(pfn, 1);
+	}
+	/* We don't need to flush tlbs because as part of autox_remove_from_p2m,
+	 * the hypervisor will do tlb flushes after removing the p2m entries
+	 * from the EPT/NPT */
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_mfn_range);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains
  2014-03-24  9:31   ` Jan Beulich
@ 2014-04-01 14:31     ` George Dunlap
  2014-04-05  0:59       ` Mukesh Rathor
  0 siblings, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-04-01 14:31 UTC (permalink / raw)
  To: Jan Beulich, Mukesh Rathor; +Cc: xen-devel, keir.xen, tim

On 03/24/2014 09:31 AM, Jan Beulich wrote:
>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> With pvh, a translated domain, get_pg_owner must allow foreign mappings.
>
> And what about HVM domains? Can you really safely drop this namely
> with the possible use of this function in an earlier patch of this series?
> (Without that new use, the change of course is safe assuming that
> PV domains are never translated; I'm not sure however whether this
> is being explicitly guaranteed anywhere these days, or just "a fact").

It's probably a better idea to make it "if( !is_pv_domain() || 
paging_mode_translate() )"  (Or some variant thereof.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-03-24  9:34   ` Jan Beulich
@ 2014-04-01 14:40     ` George Dunlap
  2014-04-01 15:09       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-04-01 14:40 UTC (permalink / raw)
  To: Jan Beulich, Mukesh Rathor
  Cc: xen-devel, keir.xen, eddie.dong, jun.nakajima, tim

On 03/24/2014 09:34 AM, Jan Beulich wrote:
>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> --- a/xen/arch/x86/hvm/vioapic.c
>> +++ b/xen/arch/x86/hvm/vioapic.c
>> @@ -238,8 +238,13 @@ static int vioapic_write(
>>
>>   static int vioapic_range(struct vcpu *v, unsigned long addr)
>>   {
>> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
>> +    struct hvm_hw_vioapic *vioapic;
>> +
>> +    /* pvh uses event channel callback */
>> +    if ( is_pvh_vcpu(v) )
>> +        return 0;
>>
>> +    vioapic = domain_vioapic(v->domain);
>
> I can see why the extra check is needed, but I can't see why you
> convert the initializer to an assignment: Afaict domain_vioapic() is
> safe even if d->arch.hvm_domain.vioapic == NULL.

Or better yet, just make it something like:

return vioapic && ((addr >= [...original range check]))

That way we don't have to have a PVH-specific hook at all.  If a domain 
doesn't have a vioapic for any reason, return 0.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-01 14:40     ` George Dunlap
@ 2014-04-01 15:09       ` Jan Beulich
  2014-04-05  1:00         ` Mukesh Rathor
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-04-01 15:09 UTC (permalink / raw)
  To: George Dunlap, Mukesh Rathor
  Cc: xen-devel, keir.xen, eddie.dong, jun.nakajima, tim

>>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
> On 03/24/2014 09:34 AM, Jan Beulich wrote:
>>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>>> --- a/xen/arch/x86/hvm/vioapic.c
>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>> @@ -238,8 +238,13 @@ static int vioapic_write(
>>>
>>>   static int vioapic_range(struct vcpu *v, unsigned long addr)
>>>   {
>>> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
>>> +    struct hvm_hw_vioapic *vioapic;
>>> +
>>> +    /* pvh uses event channel callback */
>>> +    if ( is_pvh_vcpu(v) )
>>> +        return 0;
>>>
>>> +    vioapic = domain_vioapic(v->domain);
>>
>> I can see why the extra check is needed, but I can't see why you
>> convert the initializer to an assignment: Afaict domain_vioapic() is
>> safe even if d->arch.hvm_domain.vioapic == NULL.
> 
> Or better yet, just make it something like:
> 
> return vioapic && ((addr >= [...original range check]))
> 
> That way we don't have to have a PVH-specific hook at all.  If a domain 
> doesn't have a vioapic for any reason, return 0.

No, vioapic isn't going to be NULL for PVH:

#define domain_vioapic(d) (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
                   ` (9 preceding siblings ...)
  2014-03-28 17:36 ` Roger Pau Monné
@ 2014-04-01 16:04 ` George Dunlap
  2014-04-02  1:22   ` Mukesh Rathor
  10 siblings, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-04-01 16:04 UTC (permalink / raw)
  To: Mukesh Rathor, xen-devel; +Cc: keir.xen, tim, JBeulich

On 03/22/2014 01:39 AM, Mukesh Rathor wrote:
> Hi all,
>
> Finally, please find V8 of dom0 PVH patches based on commit bc69aaf.
>
>    git tree: git://oss.oracle.com/git/mrathor/xen.git  branch: dom0pvh-v8

Thanks Mukesh -- I've gone through the series and don't have any more 
comments.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 0/8] pvh dom0....
  2014-04-01 16:04 ` George Dunlap
@ 2014-04-02  1:22   ` Mukesh Rathor
  0 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-02  1:22 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, keir.xen, tim, JBeulich

On Tue, 1 Apr 2014 17:04:17 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 03/22/2014 01:39 AM, Mukesh Rathor wrote:
> > Hi all,
> >
> > Finally, please find V8 of dom0 PVH patches based on commit bc69aaf.
> >
> >    git tree: git://oss.oracle.com/git/mrathor/xen.git  branch:
> > dom0pvh-v8
> 
> Thanks Mukesh -- I've gone through the series and don't have any more 
> comments.

Thanks. I'm still struggling with trying to get a stable environment to 
build/test out the patches. The latest on that is: linux cs 683b6c6
booting on xen cs 9011c26 hangs in linux update_wall_time() during
boot. Thus hanging PV dom0 boot. This without my PVH patches in xen.
I've had issues with just about any verions of linux I've gone back to
upto Jan 22nd. 

Anyways, at this point, I'm gonna let someone else debug that, and
try going back to my very old 3.12 linux tree so I can focus on xen
pvh dom0 patches.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-03-27 15:30           ` Tim Deegan
@ 2014-04-05  0:53             ` Mukesh Rathor
  2014-04-07  7:30               ` Jan Beulich
  2014-04-07  9:27               ` George Dunlap
  0 siblings, 2 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-05  0:53 UTC (permalink / raw)
  To: Tim Deegan; +Cc: George Dunlap, xen-devel, keir.xen, Jan Beulich

On Thu, 27 Mar 2014 16:30:37 +0100
Tim Deegan <tim@xen.org> wrote:

> At 15:04 +0000 on 27 Mar (1395929085), Jan Beulich wrote:
> > >>> On 27.03.14 at 11:55, <george.dunlap@eu.citrix.com> wrote:
> > > On 03/27/2014 10:14 AM, Jan Beulich wrote:
> > >>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
> > >>> --- a/xen/arch/x86/mm/hap/hap.c
> > >>> +++ b/xen/arch/x86/mm/hap/hap.c
> > >>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d,
> > >>> xen_domctl_shadow_op_t 
> > > *sc,
> > >>>        }
> > >>>    }
> > >>>    
> > >>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
> > >>> +                                       unsigned long num_pages)
> > >>> +{
> > >>> +    int rc;
> > >>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
> > >>> +
> > >>> +    /* Copied from: libxl_get_required_shadow_memory() */
> > >>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
> > >>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
> > >>> +    paging_lock(d);
> > >>> +    rc = hap_set_allocation(d, num_pages, NULL);
> > >>> +    paging_unlock(d);
> > >>>
> > >>>
> > >>> The calculation for how many pages of shadow memory are needed
> > >>> logically belongs in domain_build.c, not hap.c.
> > >> Iirc it was me requesting it to be moved here, on the basis that
> > >> the calculation should match the one for other domains, and
> > >> hence be done alongside that for other domains. domain_build.c
> > >> shouldn't carry HAP-specific knowledge imo.
> > > 
> > > I thought that might be the case, which is why I apologized in
> > > advance for not reading the previous thread.
> > > 
> > > The calculation should indeed match that done for other domains; 
> > > however, as the comment clearly says, this calculation is done in
> > > libxl, not in Xen at all.  Everything done to build dom0 in Xen
> > > which corresponds to things done by the toolstack to build a domU
> > > logically belongs in domain_build.c.
> > > 
> > > Putting it in hap.c would be wrong in any case; what would you do
> > > when we enable shadow mode for HAP dom0?
> > 
> > The function calls hap_set_allocation(), so it's already
> > HAP-specific.
> 
> It really ought to be calling a paging_set_allocation() wrapper;
> there's nothing _inherently_ HAP-specific about PVH.  Right now that
> wrapper doesn't exist, because that path goes via paging_domctl() for
> other domains.

Correct, PVH requires HAP at present, and I like when the code makes
that clear. In general, my approach is to make change in future when
demanded, thus, when shadow is added, it can be rearranged with small
effort. 

Anyways, I'm flexible. Jan, final word: Leave it as is (my first pref),
move to domain_build.c, or put a wrapper(my last preference) ?


thanks,
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign
  2014-03-27 12:29   ` George Dunlap
@ 2014-04-05  0:57     ` Mukesh Rathor
  0 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-05  0:57 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, keir.xen, tim, JBeulich

On Thu, 27 Mar 2014 12:29:03 +0000
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 03/22/2014 01:39 AM, Mukesh Rathor wrote:
> > @@ -751,16 +750,31 @@ set_mmio_p2m_entry(struct domain *d, unsigned
> > long gfn, mfn_t mfn) set_gpfn_from_mfn(mfn_x(omfn),
> > INVALID_M2P_ENTRY); }
.......
> > -    P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
> > -    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
> > p2m_mmio_direct, p2m->default_access);
> > +    P2M_DEBUG("set %d %lx %lx\n", gfn_p2mt, gfn, mfn_x(mfn));
> > +    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, gfn_p2mt,
> > +                       p2m->default_access);
> >       gfn_unlock(p2m, gfn, 0);
> >       if ( 0 == rc )
> >           gdprintk(XENLOG_ERR,
> > -            "set_mmio_p2m_entry: set_p2m_entry failed!
> > mfn=%08lx\n",
> > +            "%s: set_p2m_entry failed! mfn=%08lx\n", __func__,
> >               mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
> >       return rc;
> >   }
> >   
> > +/* Set foreign mfn in the given guest's p2m table.
> > + * Returns: True for success. */
> > +static int __attribute__((unused))
> > +set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t
> > mfn)
> 
> Why "unused"?  It appears that tag remains even at the end of the
> series.

Ooops, meant to undo "unused" in patch 5 where its being called. I
am surprised the compiler didn't complain on that.... 

anyways, will fix it to undo in patch 5 so patch 3 can compile.

thanks
mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains
  2014-04-01 14:31     ` George Dunlap
@ 2014-04-05  0:59       ` Mukesh Rathor
  0 siblings, 0 replies; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-05  0:59 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel, keir.xen, tim, Jan Beulich

On Tue, 1 Apr 2014 15:31:32 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 03/24/2014 09:31 AM, Jan Beulich wrote:
> >>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> >> With pvh, a translated domain, get_pg_owner must allow foreign
> >> mappings.
> >
> > And what about HVM domains? Can you really safely drop this namely
> > with the possible use of this function in an earlier patch of this
> > series? (Without that new use, the change of course is safe
> > assuming that PV domains are never translated; I'm not sure however
> > whether this is being explicitly guaranteed anywhere these days, or
> > just "a fact").
> 
> It's probably a better idea to make it "if( !is_pv_domain() || 
> paging_mode_translate() )"  (Or some variant thereof.

Yup, thats the way it was in the earlier patches, not sure how/when
it got changed. will put pvh check back there.

thanks
mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-01 15:09       ` Jan Beulich
@ 2014-04-05  1:00         ` Mukesh Rathor
  2014-04-07  6:59           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-05  1:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

On Tue, 01 Apr 2014 16:09:15 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
> > On 03/24/2014 09:34 AM, Jan Beulich wrote:
> >>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> >>> --- a/xen/arch/x86/hvm/vioapic.c
> >>> +++ b/xen/arch/x86/hvm/vioapic.c
> >>> @@ -238,8 +238,13 @@ static int vioapic_write(
> >>>
> >>>   static int vioapic_range(struct vcpu *v, unsigned long addr)
> >>>   {
> >>> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
> >>> +    struct hvm_hw_vioapic *vioapic;
> >>> +
> >>> +    /* pvh uses event channel callback */
> >>> +    if ( is_pvh_vcpu(v) )
> >>> +        return 0;
> >>>
> >>> +    vioapic = domain_vioapic(v->domain);
> >>
> >> I can see why the extra check is needed, but I can't see why you
> >> convert the initializer to an assignment: Afaict domain_vioapic()
> >> is safe even if d->arch.hvm_domain.vioapic == NULL.
> > 
> > Or better yet, just make it something like:
> > 
> > return vioapic && ((addr >= [...original range check]))
> > 
> > That way we don't have to have a PVH-specific hook at all.  If a
> > domain doesn't have a vioapic for any reason, return 0.
> 
> No, vioapic isn't going to be NULL for PVH:
> 
> #define domain_vioapic(d)
> (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)

No, viopaic is NULL for PVH, hence the patch. So, can prob just check
for the ptr like George suggests and remove the pvh check.

thanks
mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-03-24  9:26   ` Jan Beulich
@ 2014-04-05  1:17     ` Mukesh Rathor
  2014-04-07  6:57       ` Jan Beulich
  2014-04-11  1:33     ` Mukesh Rathor
  1 sibling, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-05  1:17 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

On Mon, 24 Mar 2014 09:26:58 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> > +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
......
> > because it
> > +     * will update the m2p table which will result in  mfn -> gpfn
> > of dom0
> > +     * and not fgfn of domU.
> > +     */
> > +    if ( set_foreign_p2m_entry(tdom, gpfn, _mfn(mfn)) == 0 )
> > +        gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
> > +                 "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
> > +                 gpfn, mfn, fgfn, tdom->domain_id,
> > fdom->domain_id);
> > +    else
> > +        rc = 0;
> 
> Can't you make set_foreign_p2m_entry() return a proper error code
> (if it doesn't already) and use that here instead of the relatively
> meaningless -EINVAL inherited from above (I suppose the function
> could e.g. also return -ENOMEM)?

Well, I wsa trying to keep set_foreign_p2m_entry symmetrical with
set_mmio_p2m_entry as they both call the common function 
set_typed_p2m_entry which returns 0 for failure. But, no reason
why I can't break the symmetry and flip that in set_foreign_p2m_entry.
It could return -EINVAL for error, since could be any reason for 
failure, invalid type or access being more likely. That is not propogated 
back up from set_p2m_entry() unfortunately.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-05  1:17     ` Mukesh Rathor
@ 2014-04-07  6:57       ` Jan Beulich
  2014-04-08  1:11         ` Mukesh Rathor
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-04-07  6:57 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 05.04.14 at 03:17, <mukesh.rathor@oracle.com> wrote:
> On Mon, 24 Mar 2014 09:26:58 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> > +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
> ......
>> > because it
>> > +     * will update the m2p table which will result in  mfn -> gpfn
>> > of dom0
>> > +     * and not fgfn of domU.
>> > +     */
>> > +    if ( set_foreign_p2m_entry(tdom, gpfn, _mfn(mfn)) == 0 )
>> > +        gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed. "
>> > +                 "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
>> > +                 gpfn, mfn, fgfn, tdom->domain_id,
>> > fdom->domain_id);
>> > +    else
>> > +        rc = 0;
>> 
>> Can't you make set_foreign_p2m_entry() return a proper error code
>> (if it doesn't already) and use that here instead of the relatively
>> meaningless -EINVAL inherited from above (I suppose the function
>> could e.g. also return -ENOMEM)?
> 
> Well, I wsa trying to keep set_foreign_p2m_entry symmetrical with
> set_mmio_p2m_entry as they both call the common function 
> set_typed_p2m_entry which returns 0 for failure. But, no reason
> why I can't break the symmetry and flip that in set_foreign_p2m_entry.
> It could return -EINVAL for error, since could be any reason for 
> failure, invalid type or access being more likely. That is not propogated 
> back up from set_p2m_entry() unfortunately.

In the end I think we'll want to make them all report proper error
codes...

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-05  1:00         ` Mukesh Rathor
@ 2014-04-07  6:59           ` Jan Beulich
  2014-04-07  9:28             ` George Dunlap
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-04-07  6:59 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 05.04.14 at 03:00, <mukesh.rathor@oracle.com> wrote:
> On Tue, 01 Apr 2014 16:09:15 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
>> > On 03/24/2014 09:34 AM, Jan Beulich wrote:
>> >>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> >>> --- a/xen/arch/x86/hvm/vioapic.c
>> >>> +++ b/xen/arch/x86/hvm/vioapic.c
>> >>> @@ -238,8 +238,13 @@ static int vioapic_write(
>> >>>
>> >>>   static int vioapic_range(struct vcpu *v, unsigned long addr)
>> >>>   {
>> >>> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
>> >>> +    struct hvm_hw_vioapic *vioapic;
>> >>> +
>> >>> +    /* pvh uses event channel callback */
>> >>> +    if ( is_pvh_vcpu(v) )
>> >>> +        return 0;
>> >>>
>> >>> +    vioapic = domain_vioapic(v->domain);
>> >>
>> >> I can see why the extra check is needed, but I can't see why you
>> >> convert the initializer to an assignment: Afaict domain_vioapic()
>> >> is safe even if d->arch.hvm_domain.vioapic == NULL.
>> > 
>> > Or better yet, just make it something like:
>> > 
>> > return vioapic && ((addr >= [...original range check]))
>> > 
>> > That way we don't have to have a PVH-specific hook at all.  If a
>> > domain doesn't have a vioapic for any reason, return 0.
>> 
>> No, vioapic isn't going to be NULL for PVH:
>> 
>> #define domain_vioapic(d)
>> (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)
> 
> No, viopaic is NULL for PVH, hence the patch. So, can prob just check
> for the ptr like George suggests and remove the pvh check.

Sorry, no - I agree that d->arch.hvm_domain.vioapic is NULL for
PVH, but that doesn't mean the result of domain_vioapic(d) is too
(see the quoted #define above).

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-04-05  0:53             ` Mukesh Rathor
@ 2014-04-07  7:30               ` Jan Beulich
  2014-04-07  9:27               ` George Dunlap
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2014-04-07  7:30 UTC (permalink / raw)
  To: Mukesh Rathor, Tim Deegan; +Cc: George Dunlap, xen-devel, keir.xen

>>> On 05.04.14 at 02:53, <mukesh.rathor@oracle.com> wrote:
> On Thu, 27 Mar 2014 16:30:37 +0100
> Tim Deegan <tim@xen.org> wrote:
> 
>> At 15:04 +0000 on 27 Mar (1395929085), Jan Beulich wrote:
>> > >>> On 27.03.14 at 11:55, <george.dunlap@eu.citrix.com> wrote:
>> > > On 03/27/2014 10:14 AM, Jan Beulich wrote:
>> > >>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
>> > >>> --- a/xen/arch/x86/mm/hap/hap.c
>> > >>> +++ b/xen/arch/x86/mm/hap/hap.c
>> > >>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d,
>> > >>> xen_domctl_shadow_op_t 
>> > > *sc,
>> > >>>        }
>> > >>>    }
>> > >>>    
>> > >>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
>> > >>> +                                       unsigned long num_pages)
>> > >>> +{
>> > >>> +    int rc;
>> > >>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
>> > >>> +
>> > >>> +    /* Copied from: libxl_get_required_shadow_memory() */
>> > >>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
>> > >>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
>> > >>> +    paging_lock(d);
>> > >>> +    rc = hap_set_allocation(d, num_pages, NULL);
>> > >>> +    paging_unlock(d);
>> > >>>
>> > >>>
>> > >>> The calculation for how many pages of shadow memory are needed
>> > >>> logically belongs in domain_build.c, not hap.c.
>> > >> Iirc it was me requesting it to be moved here, on the basis that
>> > >> the calculation should match the one for other domains, and
>> > >> hence be done alongside that for other domains. domain_build.c
>> > >> shouldn't carry HAP-specific knowledge imo.
>> > > 
>> > > I thought that might be the case, which is why I apologized in
>> > > advance for not reading the previous thread.
>> > > 
>> > > The calculation should indeed match that done for other domains; 
>> > > however, as the comment clearly says, this calculation is done in
>> > > libxl, not in Xen at all.  Everything done to build dom0 in Xen
>> > > which corresponds to things done by the toolstack to build a domU
>> > > logically belongs in domain_build.c.
>> > > 
>> > > Putting it in hap.c would be wrong in any case; what would you do
>> > > when we enable shadow mode for HAP dom0?
>> > 
>> > The function calls hap_set_allocation(), so it's already
>> > HAP-specific.
>> 
>> It really ought to be calling a paging_set_allocation() wrapper;
>> there's nothing _inherently_ HAP-specific about PVH.  Right now that
>> wrapper doesn't exist, because that path goes via paging_domctl() for
>> other domains.
> 
> Correct, PVH requires HAP at present, and I like when the code makes
> that clear. In general, my approach is to make change in future when
> demanded, thus, when shadow is added, it can be rearranged with small
> effort. 
> 
> Anyways, I'm flexible. Jan, final word: Leave it as is (my first pref),
> move to domain_build.c, or put a wrapper(my last preference) ?

I'm fine with leaving it as is (in hap.c), but of course (as usually) my
first preference is to have it done cleanly (i.e. like Tim suggests) -
realizing that this is what would imply you doing more changes than
either of the other two options.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 2/8] pvh dom0: construct_dom0 changes
  2014-04-05  0:53             ` Mukesh Rathor
  2014-04-07  7:30               ` Jan Beulich
@ 2014-04-07  9:27               ` George Dunlap
  1 sibling, 0 replies; 51+ messages in thread
From: George Dunlap @ 2014-04-07  9:27 UTC (permalink / raw)
  To: Mukesh Rathor, Tim Deegan; +Cc: xen-devel, keir.xen, Jan Beulich

On 04/05/2014 01:53 AM, Mukesh Rathor wrote:
> On Thu, 27 Mar 2014 16:30:37 +0100
> Tim Deegan <tim@xen.org> wrote:
>
>> At 15:04 +0000 on 27 Mar (1395929085), Jan Beulich wrote:
>>>>>> On 27.03.14 at 11:55, <george.dunlap@eu.citrix.com> wrote:
>>>> On 03/27/2014 10:14 AM, Jan Beulich wrote:
>>>>>>>> On 26.03.14 at 20:05, <george.dunlap@eu.citrix.com> wrote:
>>>>>> --- a/xen/arch/x86/mm/hap/hap.c
>>>>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>>>>> @@ -589,6 +589,21 @@ int hap_domctl(struct domain *d,
>>>>>> xen_domctl_shadow_op_t
>>>> *sc,
>>>>>>         }
>>>>>>     }
>>>>>>     
>>>>>> +void __init hap_set_pvh_alloc_for_dom0(struct domain *d,
>>>>>> +                                       unsigned long num_pages)
>>>>>> +{
>>>>>> +    int rc;
>>>>>> +    unsigned long memkb = num_pages * (PAGE_SIZE / 1024);
>>>>>> +
>>>>>> +    /* Copied from: libxl_get_required_shadow_memory() */
>>>>>> +    memkb = 4 * (256 * d->max_vcpus + 2 * (memkb / 1024));
>>>>>> +    num_pages = ( (memkb + 1023) / 1024) << (20 - PAGE_SHIFT);
>>>>>> +    paging_lock(d);
>>>>>> +    rc = hap_set_allocation(d, num_pages, NULL);
>>>>>> +    paging_unlock(d);
>>>>>>
>>>>>>
>>>>>> The calculation for how many pages of shadow memory are needed
>>>>>> logically belongs in domain_build.c, not hap.c.
>>>>> Iirc it was me requesting it to be moved here, on the basis that
>>>>> the calculation should match the one for other domains, and
>>>>> hence be done alongside that for other domains. domain_build.c
>>>>> shouldn't carry HAP-specific knowledge imo.
>>>> I thought that might be the case, which is why I apologized in
>>>> advance for not reading the previous thread.
>>>>
>>>> The calculation should indeed match that done for other domains;
>>>> however, as the comment clearly says, this calculation is done in
>>>> libxl, not in Xen at all.  Everything done to build dom0 in Xen
>>>> which corresponds to things done by the toolstack to build a domU
>>>> logically belongs in domain_build.c.
>>>>
>>>> Putting it in hap.c would be wrong in any case; what would you do
>>>> when we enable shadow mode for HAP dom0?
>>> The function calls hap_set_allocation(), so it's already
>>> HAP-specific.
>> It really ought to be calling a paging_set_allocation() wrapper;
>> there's nothing _inherently_ HAP-specific about PVH.  Right now that
>> wrapper doesn't exist, because that path goes via paging_domctl() for
>> other domains.
> Correct, PVH requires HAP at present, and I like when the code makes
> that clear. In general, my approach is to make change in future when
> demanded, thus, when shadow is added, it can be rearranged with small
> effort.
>
> Anyways, I'm flexible. Jan, final word: Leave it as is (my first pref),
> move to domain_build.c, or put a wrapper(my last preference) ?

I'm fine with the code in domain_build.c calling hap_set_allocation() 
directly for the time being, and implementing a paging_set_allocation() 
wrapper later; but I think the code doing the calculation should be in 
domain_build.c.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-07  6:59           ` Jan Beulich
@ 2014-04-07  9:28             ` George Dunlap
  2014-04-08  1:00               ` Mukesh Rathor
  0 siblings, 1 reply; 51+ messages in thread
From: George Dunlap @ 2014-04-07  9:28 UTC (permalink / raw)
  To: Jan Beulich, Mukesh Rathor
  Cc: xen-devel, keir.xen, eddie.dong, jun.nakajima, tim

On 04/07/2014 07:59 AM, Jan Beulich wrote:
>>>> On 05.04.14 at 03:00, <mukesh.rathor@oracle.com> wrote:
>> On Tue, 01 Apr 2014 16:09:15 +0100
>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>
>>>>>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
>>>> On 03/24/2014 09:34 AM, Jan Beulich wrote:
>>>>>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>>>>>> --- a/xen/arch/x86/hvm/vioapic.c
>>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>>>>> @@ -238,8 +238,13 @@ static int vioapic_write(
>>>>>>
>>>>>>    static int vioapic_range(struct vcpu *v, unsigned long addr)
>>>>>>    {
>>>>>> -    struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
>>>>>> +    struct hvm_hw_vioapic *vioapic;
>>>>>> +
>>>>>> +    /* pvh uses event channel callback */
>>>>>> +    if ( is_pvh_vcpu(v) )
>>>>>> +        return 0;
>>>>>>
>>>>>> +    vioapic = domain_vioapic(v->domain);
>>>>> I can see why the extra check is needed, but I can't see why you
>>>>> convert the initializer to an assignment: Afaict domain_vioapic()
>>>>> is safe even if d->arch.hvm_domain.vioapic == NULL.
>>>> Or better yet, just make it something like:
>>>>
>>>> return vioapic && ((addr >= [...original range check]))
>>>>
>>>> That way we don't have to have a PVH-specific hook at all.  If a
>>>> domain doesn't have a vioapic for any reason, return 0.
>>> No, vioapic isn't going to be NULL for PVH:
>>>
>>> #define domain_vioapic(d)
>>> (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)
>> No, viopaic is NULL for PVH, hence the patch. So, can prob just check
>> for the ptr like George suggests and remove the pvh check.
> Sorry, no - I agree that d->arch.hvm_domain.vioapic is NULL for
> PVH, but that doesn't mean the result of domain_vioapic(d) is too
> (see the quoted #define above).

I interpreted Mukesh saying "check for the ptr" as, "check 
d->arch_hvm.domain.vioapic", not "check domain_vioapic(d)".

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-07  9:28             ` George Dunlap
@ 2014-04-08  1:00               ` Mukesh Rathor
  2014-04-08  8:21                 ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-08  1:00 UTC (permalink / raw)
  To: George Dunlap
  Cc: Jan Beulich, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

On Mon, 7 Apr 2014 10:28:50 +0100
George Dunlap <george.dunlap@eu.citrix.com> wrote:

> On 04/07/2014 07:59 AM, Jan Beulich wrote:
> >>>> On 05.04.14 at 03:00, <mukesh.rathor@oracle.com> wrote:
> >> On Tue, 01 Apr 2014 16:09:15 +0100
> >> "Jan Beulich" <JBeulich@suse.com> wrote:
> >>
> >>>>>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
> >>>> On 03/24/2014 09:34 AM, Jan Beulich wrote:
> >>>>>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> >>>>>> --- a/xen/arch/x86/hvm/vioapic.c
> >>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
> >>>>>> @@ -238,8 +238,13 @@ static int vioapic_write(
> >>>>>>
> >>>>>>    static int vioapic_range(struct vcpu *v, unsigned long addr)
> >>>>>>    {
> >>>>>> -    struct hvm_hw_vioapic *vioapic =
> >>>>>> domain_vioapic(v->domain);
> >>>>>> +    struct hvm_hw_vioapic *vioapic;
> >>>>>> +
> >>>>>> +    /* pvh uses event channel callback */
> >>>>>> +    if ( is_pvh_vcpu(v) )
> >>>>>> +        return 0;
> >>>>>>
> >>>>>> +    vioapic = domain_vioapic(v->domain);
> >>>>> I can see why the extra check is needed, but I can't see why you
> >>>>> convert the initializer to an assignment: Afaict
> >>>>> domain_vioapic() is safe even if d->arch.hvm_domain.vioapic ==
> >>>>> NULL.
> >>>> Or better yet, just make it something like:
> >>>>
> >>>> return vioapic && ((addr >= [...original range check]))
> >>>>
> >>>> That way we don't have to have a PVH-specific hook at all.  If a
> >>>> domain doesn't have a vioapic for any reason, return 0.
> >>> No, vioapic isn't going to be NULL for PVH:
> >>>
> >>> #define domain_vioapic(d)
> >>> (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)
> >> No, viopaic is NULL for PVH, hence the patch. So, can prob just
> >> check for the ptr like George suggests and remove the pvh check.
> > Sorry, no - I agree that d->arch.hvm_domain.vioapic is NULL for
> > PVH, but that doesn't mean the result of domain_vioapic(d) is too
> > (see the quoted #define above).
> 
> I interpreted Mukesh saying "check for the ptr" as, "check 
> d->arch_hvm.domain.vioapic", not "check domain_vioapic(d)".

Correct. Since, vioapic is NULL for PVH, domain_vioapic(d) will
result in NULL ptr exception for PVH, so we can't call that macro
for PVH.

thanks
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-07  6:57       ` Jan Beulich
@ 2014-04-08  1:11         ` Mukesh Rathor
  2014-04-08  7:36           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-08  1:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

On Mon, 07 Apr 2014 07:57:56 +0100
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 05.04.14 at 03:17, <mukesh.rathor@oracle.com> wrote:
> > On Mon, 24 Mar 2014 09:26:58 +0000
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> > 
> >> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> >> > +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
> > ......
> >> > because it
> >> > +     * will update the m2p table which will result in  mfn ->
> >> > gpfn of dom0
> >> > +     * and not fgfn of domU.
> >> > +     */
> >> > +    if ( set_foreign_p2m_entry(tdom, gpfn, _mfn(mfn)) == 0 )
> >> > +        gdprintk(XENLOG_WARNING, "set_foreign_p2m_entry failed.
> >> > "
> >> > +                 "gpfn:%lx mfn:%lx fgfn:%lx td:%d fd:%d\n",
> >> > +                 gpfn, mfn, fgfn, tdom->domain_id,
> >> > fdom->domain_id);
> >> > +    else
> >> > +        rc = 0;
> >> 
> >> Can't you make set_foreign_p2m_entry() return a proper error code
> >> (if it doesn't already) and use that here instead of the relatively
> >> meaningless -EINVAL inherited from above (I suppose the function
> >> could e.g. also return -ENOMEM)?
> > 
> > Well, I wsa trying to keep set_foreign_p2m_entry symmetrical with
> > set_mmio_p2m_entry as they both call the common function 
> > set_typed_p2m_entry which returns 0 for failure. But, no reason
> > why I can't break the symmetry and flip that in
> > set_foreign_p2m_entry. It could return -EINVAL for error, since
> > could be any reason for failure, invalid type or access being more
> > likely. That is not propogated back up from set_p2m_entry()
> > unfortunately.
> 
> In the end I think we'll want to make them all report proper error
> codes...

Ok, how about the following patch then? If it's OK, I'd like to submit
independently.

thanks,
mukesh

------------------------------------------------------------------------------
>From d043dc1b3b6bed55a66137337198ab2947f9ce7d Mon Sep 17 00:00:00 2001
From: Mukesh Rathor <mukesh.rathor@oracle.com>
Date: Mon, 7 Apr 2014 18:02:20 -0700
Subject: [PATCH] P2M error code propogation

This patch doesn't change any functionality. Because some of the leaf
p2m functions return 0 for failure and TRUE for success, the real
errno is lost. We change the code to return proper -errno. Also, any
code in the immediate vicinity that is in coding style violation is
fixed up.
This patch is not required for PVH, but since PVH touches some of
the P2M paths and introduces a new p2m type, we take the opportunity
to fix things up.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
---
 xen/arch/x86/domctl.c            |  9 ++--
 xen/arch/x86/mm/hap/nested_hap.c | 11 ++---
 xen/arch/x86/mm/mem_sharing.c    | 10 ++---
 xen/arch/x86/mm/p2m-ept.c        | 17 +++++---
 xen/arch/x86/mm/p2m-pod.c        |  7 ++-
 xen/arch/x86/mm/p2m-pt.c         | 48 ++++++++++----------
 xen/arch/x86/mm/p2m.c            | 94 ++++++++++++++++++++--------------------
 7 files changed, 99 insertions(+), 97 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 26635ff..94cb390 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -671,13 +671,12 @@ long arch_do_domctl(
             if ( !ret && paging_mode_translate(d) )
             {
                 for ( i = 0; !ret && i < nr_mfns; i++ )
-                    if ( !set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i)) )
-                        ret = -EIO;
+                    ret = set_mmio_p2m_entry(d, gfn + i, _mfn(mfn + i));
                 if ( ret )
                 {
                     printk(XENLOG_G_WARNING
-                           "memory_map:fail: dom%d gfn=%lx mfn=%lx\n",
-                           d->domain_id, gfn + i, mfn + i);
+                           "memory_map:fail: dom%d gfn=%lx mfn=%lx ret:%ld\n",
+                           d->domain_id, gfn + i, mfn + i, ret);
                     while ( i-- )
                         clear_mmio_p2m_entry(d, gfn + i);
                     if ( iomem_deny_access(d, mfn, mfn + nr_mfns - 1) &&
@@ -696,7 +695,7 @@ long arch_do_domctl(
 
             if ( paging_mode_translate(d) )
                 for ( i = 0; i < nr_mfns; i++ )
-                    add |= !clear_mmio_p2m_entry(d, gfn + i);
+                    add |= !!clear_mmio_p2m_entry(d, gfn + i);
             ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
             if ( !ret && add )
                 ret = -EIO;
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 38e2327..0ad36c7 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -103,7 +103,7 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
                   paddr_t L2_gpa, paddr_t L0_gpa,
                   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
-    int rv = 1;
+    int rc = 0;
     ASSERT(p2m);
     ASSERT(p2m->set_entry);
 
@@ -124,15 +124,16 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
         gfn = (L2_gpa >> PAGE_SHIFT) & mask;
         mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
 
-        rv = set_p2m_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
+        rc = set_p2m_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
     }
 
     p2m_unlock(p2m);
 
-    if (rv == 0) {
+    if ( rc )
+    {
         gdprintk(XENLOG_ERR,
-		"failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
-		L2_gpa, L0_gpa);
+		"failed to set entry for %#"PRIx64" -> %#"PRIx64" rc:%d\n",
+		L2_gpa, L0_gpa, rc);
         BUG();
     }
 }
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index 7ed6594..290e1b3 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1021,7 +1021,7 @@ int mem_sharing_share_pages(struct domain *sd, unsigned long sgfn, shr_handle_t
         put_page_and_type(cpage);
         d = get_domain_by_id(gfn->domain);
         BUG_ON(!d);
-        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn) == 0);
+        BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn));
         put_domain(d);
     }
     ASSERT(list_empty(&cpage->sharing->gfns));
@@ -1095,13 +1095,11 @@ int mem_sharing_add_to_physmap(struct domain *sd, unsigned long sgfn, shr_handle
     ret = set_p2m_entry(p2m, cgfn, smfn, PAGE_ORDER_4K, p2m_ram_shared, a);
 
     /* Tempted to turn this into an assert */
-    if ( !ret )
+    if ( ret )
     {
-        ret = -ENOENT;
         mem_sharing_gfn_destroy(spage, cd, gfn_info);
         put_page_and_type(spage);
     } else {
-        ret = 0;
         /* There is a chance we're plugging a hole where a paged out page was */
         if ( p2m_is_paging(cmfn_type) && (cmfn_type != p2m_ram_paging_out) )
         {
@@ -1232,7 +1230,7 @@ int __mem_sharing_unshare_page(struct domain *d,
     unmap_domain_page(s);
     unmap_domain_page(t);
 
-    BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)) == 0);
+    BUG_ON(set_shared_p2m_entry(d, gfn, page_to_mfn(page)));
     mem_sharing_gfn_destroy(old_page, d, gfn_info);
     mem_sharing_page_unlock(old_page);
     put_page_and_type(old_page);
@@ -1288,7 +1286,7 @@ int relinquish_shared_pages(struct domain *d)
              * we hold the p2m lock. */
             set_rc = p2m->set_entry(p2m, gfn, _mfn(0), PAGE_ORDER_4K,
                                     p2m_invalid, p2m_access_rwx);
-            ASSERT(set_rc != 0);
+            ASSERT(set_rc == 0);
             count += 0x10;
         }
         else
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 99a1084..a219f8b 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -273,6 +273,8 @@ static int ept_next_level(struct p2m_domain *p2m, bool_t read_only,
 /*
  * ept_set_entry() computes 'need_modify_vtd_table' for itself,
  * by observing whether any gfn->mfn translations are modified.
+ *
+ * Returns: 0 for success, -errno for failure
  */
 static int
 ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
@@ -281,7 +283,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     ept_entry_t *table, *ept_entry = NULL;
     unsigned long gfn_remainder = gfn;
     int i, target = order / EPT_TABLE_ORDER;
-    int rv = 0;
+    int rc = 0;
     int ret = 0;
     bool_t direct_mmio = (p2mt == p2m_mmio_direct);
     uint8_t ipat = 0;
@@ -302,7 +304,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) ||
          ((u64)gfn >> ((ept_get_wl(ept) + 1) * EPT_TABLE_ORDER)) ||
          (order % EPT_TABLE_ORDER) )
-        return 0;
+        return -EINVAL;
 
     ASSERT((target == 2 && hvm_hap_has_1gb()) ||
            (target == 1 && hvm_hap_has_2mb()) ||
@@ -314,7 +316,10 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
     {
         ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i);
         if ( !ret )
+        {
+            rc = -ENOENT;
             goto out;
+        }
         else if ( ret != GUEST_TABLE_NORMAL_PAGE )
             break;
     }
@@ -386,6 +391,7 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         if ( !ept_split_super_page(p2m, &split_ept_entry, i, target) )
         {
             ept_free_entry(p2m, &split_ept_entry, i);
+            rc = -ENOMEM;
             goto out;
         }
 
@@ -426,9 +432,6 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
          (gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) )
         p2m->max_mapped_pfn = gfn + (1UL << order) - 1;
 
-    /* Success */
-    rv = 1;
-
 out:
     unmap_domain_page(table);
 
@@ -436,7 +439,7 @@ out:
         ept_sync_domain(p2m);
 
     /* For non-nested p2m, may need to change VT-d page table.*/
-    if ( rv && !p2m_is_nestedp2m(p2m) && need_iommu(d) &&
+    if ( rc == 0 && !p2m_is_nestedp2m(p2m) && need_iommu(d) &&
          need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
@@ -460,7 +463,7 @@ out:
     if ( is_epte_present(&old_entry) )
         ept_free_entry(p2m, &old_entry, target);
 
-    return rv;
+    return rc;
 }
 
 /* Read ept p2m entries */
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index d14565d..31bec46 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -1146,10 +1146,9 @@ guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
     }
 
     /* Now, actually do the two-way mapping */
-    if ( !set_p2m_entry(p2m, gfn, _mfn(0), order,
-                        p2m_populate_on_demand, p2m->default_access) )
-        rc = -EINVAL;
-    else
+    rc = set_p2m_entry(p2m, gfn, _mfn(0), order,
+                       p2m_populate_on_demand, p2m->default_access);
+    if ( rc == 0 )
     {
         pod_lock(p2m);
         p2m->pod.entry_count += 1 << order;
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index a1d5650..e1eda97 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -154,6 +154,7 @@ static void p2m_add_iommu_flags(l1_pgentry_t *p2m_entry,
         l1e_add_flags(*p2m_entry, iommu_nlevel_to_flags(nlevel, flags));
 }
 
+/* Returns: 0 for success, -errno for failure */
 static int
 p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
                unsigned long *gfn_remainder, unsigned long gfn, u32 shift,
@@ -167,7 +168,7 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
 
     if ( !(p2m_entry = p2m_find_entry(*table, gfn_remainder, gfn,
                                       shift, max)) )
-        return 0;
+        return -ENOENT;
 
     /* PoD/paging: Not present doesn't imply empty. */
     if ( !l1e_get_flags(*p2m_entry) )
@@ -176,7 +177,7 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
 
         pg = p2m_alloc_ptp(p2m, type);
         if ( pg == NULL )
-            return 0;
+            return -ENOMEM;
 
         new_entry = l1e_from_pfn(mfn_x(page_to_mfn(pg)),
                                  __PAGE_HYPERVISOR | _PAGE_USER);
@@ -210,7 +211,7 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
 
         pg = p2m_alloc_ptp(p2m, PGT_l2_page_table);
         if ( pg == NULL )
-            return 0;
+            return -ENOMEM;
 
         flags = l1e_get_flags(*p2m_entry);
         pfn = l1e_get_pfn(*p2m_entry);
@@ -239,7 +240,7 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
 
         pg = p2m_alloc_ptp(p2m, PGT_l1_page_table);
         if ( pg == NULL )
-            return 0;
+            return -ENOMEM;
 
         /* New splintered mappings inherit the flags of the old superpage, 
          * with a little reorganisation for the _PAGE_PSE_PAT bit. */
@@ -272,23 +273,23 @@ p2m_next_level(struct p2m_domain *p2m, mfn_t *table_mfn, void **table,
     unmap_domain_page(*table);
     *table = next;
 
-    return 1;
+    return 0;
 }
 
-// Returns 0 on error (out of memory)
+/* Returns: 0 for success, -errno for failure */
 static int
 p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
               unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
-    // XXX -- this might be able to be faster iff current->domain == d
+    /* XXX -- this might be able to be faster iff current->domain == d */
     mfn_t table_mfn = pagetable_get_mfn(p2m_get_pagetable(p2m));
-    void *table =map_domain_page(mfn_x(table_mfn));
+    void *table = map_domain_page(mfn_x(table_mfn));
     unsigned long i, gfn_remainder = gfn;
     l1_pgentry_t *p2m_entry;
     l1_pgentry_t entry_content;
     l2_pgentry_t l2e_content;
     l3_pgentry_t l3e_content;
-    int rv=0;
+    int rc;
     unsigned int iommu_pte_flags = (p2mt == p2m_ram_rw) ?
                                    IOMMUF_readable|IOMMUF_writable:
                                    0; 
@@ -311,9 +312,10 @@ p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         __trace_var(TRC_MEM_SET_P2M_ENTRY, 0, sizeof(t), &t);
     }
 
-    if ( !p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
-                         L4_PAGETABLE_SHIFT - PAGE_SHIFT,
-                         L4_PAGETABLE_ENTRIES, PGT_l3_page_table) )
+    rc = p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
+                        L4_PAGETABLE_SHIFT - PAGE_SHIFT,
+                        L4_PAGETABLE_ENTRIES, PGT_l3_page_table);
+    if ( rc )
         goto out;
 
     /*
@@ -354,17 +356,18 @@ p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         if ( l1e_get_flags(old_entry) & _PAGE_PRESENT )
             p2m_free_entry(p2m, &old_entry, page_order);
     }
-    else if ( !p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
-                              L3_PAGETABLE_SHIFT - PAGE_SHIFT,
-                              L3_PAGETABLE_ENTRIES,
-                              PGT_l2_page_table) )
+    else if ( (rc = p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder,
+                                   gfn, L3_PAGETABLE_SHIFT - PAGE_SHIFT,
+                                   L3_PAGETABLE_ENTRIES,
+                                   PGT_l2_page_table)) )
         goto out;
 
     if ( page_order == PAGE_ORDER_4K )
     {
-        if ( !p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
-                             L2_PAGETABLE_SHIFT - PAGE_SHIFT,
-                             L2_PAGETABLE_ENTRIES, PGT_l1_page_table) )
+        rc = p2m_next_level(p2m, &table_mfn, &table, &gfn_remainder, gfn,
+                            L2_PAGETABLE_SHIFT - PAGE_SHIFT,
+                            L2_PAGETABLE_ENTRIES, PGT_l1_page_table);
+        if ( rc ) 
             goto out;
 
         p2m_entry = p2m_find_entry(table, &gfn_remainder, gfn,
@@ -452,12 +455,9 @@ p2m_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         }
     }
 
-    /* Success */
-    rv = 1;
-
-out:
+ out:
     unmap_domain_page(table);
-    return rv;
+    return rc;
 }
 
 static mfn_t
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index ec08e18..79b8a5d 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -309,13 +309,14 @@ struct page_info *get_page_from_gfn_p2m(
 }
 
 
+/* Returns: 0 for success, -errno for failure */
 int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, 
                   unsigned int page_order, p2m_type_t p2mt, p2m_access_t p2ma)
 {
     struct domain *d = p2m->domain;
     unsigned long todo = 1ul << page_order;
     unsigned int order;
-    int rc = 1;
+    int set_rc, rc = 0;
 
     ASSERT(gfn_locked_by_me(p2m, gfn));
 
@@ -329,8 +330,8 @@ int set_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn,
         else
             order = 0;
 
-        if ( !p2m->set_entry(p2m, gfn, mfn, order, p2mt, p2ma) )
-            rc = 0;
+        if ( (set_rc = p2m->set_entry(p2m, gfn, mfn, order, p2mt, p2ma)) )
+            rc = set_rc;
         gfn += 1ul << order;
         if ( mfn_x(mfn) != INVALID_MFN )
             mfn = _mfn(mfn_x(mfn) + (1ul << order));
@@ -370,17 +371,19 @@ void p2m_free_ptp(struct p2m_domain *p2m, struct page_info *pg)
     return;
 }
 
-// Allocate a new p2m table for a domain.
-//
-// The structure of the p2m table is that of a pagetable for xen (i.e. it is
-// controlled by CONFIG_PAGING_LEVELS).
-//
-// Returns 0 for success or -errno.
-//
+/*
+ * Allocate a new p2m table for a domain.
+ *
+ * The structure of the p2m table is that of a pagetable for xen (i.e. it is
+ * controlled by CONFIG_PAGING_LEVELS).
+ *
+ * Returns 0 for success, -errno for failure.
+ */
 int p2m_alloc_table(struct p2m_domain *p2m)
 {
     struct page_info *p2m_top;
     struct domain *d = p2m->domain;
+    int rc = 0;
 
     p2m_lock(p2m);
 
@@ -417,8 +420,8 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     /* Initialise physmap tables for slot zero. Other code assumes this. */
     p2m->defer_nested_flush = 1;
-    if ( !set_p2m_entry(p2m, 0, _mfn(INVALID_MFN), PAGE_ORDER_4K,
-                        p2m_invalid, p2m->default_access) )
+    if ( (rc = set_p2m_entry(p2m, 0, _mfn(INVALID_MFN), PAGE_ORDER_4K,
+                             p2m_invalid, p2m->default_access)) )
         goto error;
     p2m->defer_nested_flush = 0;
 
@@ -428,10 +431,10 @@ int p2m_alloc_table(struct p2m_domain *p2m)
 
     spin_unlock(&p2m->domain->page_alloc_lock);
  error:
-    P2M_PRINTK("failed to initialize p2m table, gfn=%05lx, mfn=%"
-               PRI_mfn "\n", gfn, mfn_x(mfn));
+    P2M_PRINTK("failed to initialize p2m table, gfn=%05lx, mfn=% rc:%d"
+               PRI_mfn "\n", gfn, mfn_x(mfn), rc);
     p2m_unlock(p2m);
-    return -ENOMEM;
+    return rc;
 }
 
 void p2m_teardown(struct p2m_domain *p2m)
@@ -636,11 +639,10 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     /* Now, actually do the two-way mapping */
     if ( mfn_valid(_mfn(mfn)) ) 
     {
-        if ( !set_p2m_entry(p2m, gfn, _mfn(mfn), page_order, t, p2m->default_access) )
-        {
-            rc = -EINVAL;
+        if ( (rc = set_p2m_entry(p2m, gfn, _mfn(mfn), page_order, t,
+                                 p2m->default_access)) )
             goto out; /* Failed to update p2m, bail without updating m2p. */
-        }
+
         if ( !p2m_is_grant(t) )
         {
             for ( i = 0; i < (1UL << page_order); i++ )
@@ -651,10 +653,8 @@ guest_physmap_add_entry(struct domain *d, unsigned long gfn,
     {
         gdprintk(XENLOG_WARNING, "Adding bad mfn to p2m map (%#lx -> %#lx)\n",
                  gfn, mfn);
-        if ( !set_p2m_entry(p2m, gfn, _mfn(INVALID_MFN), page_order, 
-                            p2m_invalid, p2m->default_access) )
-            rc = -EINVAL;
-        else
+        if ( (rc = set_p2m_entry(p2m, gfn, _mfn(INVALID_MFN), page_order, 
+                                 p2m_invalid, p2m->default_access)) == 0 )
         {
             pod_lock(p2m);
             p2m->pod.entry_count -= pod_count;
@@ -741,7 +741,7 @@ void p2m_change_type_range(struct domain *d,
 }
 
 
-
+/* Returns: 0 for success, -errno for failure */
 int
 set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
@@ -752,7 +752,7 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
-        return 0;
+        return -EPERM;
 
     gfn_lock(p2m, gfn, 0);
     omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL);
@@ -760,7 +760,7 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
     {
         p2m_unlock(p2m);
         domain_crash(d);
-        return 0;
+        return -ENOENT;
     }
     else if ( p2m_is_ram(ot) )
     {
@@ -769,26 +769,28 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
     }
 
     P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
-    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access);
+    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct,
+                       p2m->default_access);
     gfn_unlock(p2m, gfn, 0);
-    if ( 0 == rc )
+    if ( rc )
         gdprintk(XENLOG_ERR,
-            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
-            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
+            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx rc:%d\n",
+            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);
     return rc;
 }
 
+/* Returns: 0 for success, -errno for failure */
 int
 clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
 {
-    int rc = 0;
+    int rc = -EINVAL;
     mfn_t mfn;
     p2m_access_t a;
     p2m_type_t t;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
 
     if ( !paging_mode_translate(d) )
-        return 0;
+        return -EPERM;
 
     gfn_lock(p2m, gfn, 0);
     mfn = p2m->get_entry(p2m, gfn, &t, &a, 0, NULL);
@@ -800,14 +802,16 @@ clear_mmio_p2m_entry(struct domain *d, unsigned long gfn)
             "clear_mmio_p2m_entry: gfn_to_mfn failed! gfn=%08lx\n", gfn);
         goto out;
     }
-    rc = set_p2m_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_4K, p2m_invalid, p2m->default_access);
+    rc = set_p2m_entry(p2m, gfn, _mfn(INVALID_MFN), PAGE_ORDER_4K, p2m_invalid,
+                       p2m->default_access);
 
-out:
+ out:
     gfn_unlock(p2m, gfn, 0);
 
     return rc;
 }
 
+/* Returns: 0 for success, -errno for failure */
 int
 set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
@@ -819,7 +823,7 @@ set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
     unsigned long pg_type;
 
     if ( !paging_mode_translate(p2m->domain) )
-        return 0;
+        return -EPERM;
 
     gfn_lock(p2m, gfn, 0);
     omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL);
@@ -835,12 +839,13 @@ set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
         set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
 
     P2M_DEBUG("set shared %lx %lx\n", gfn, mfn_x(mfn));
-    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_shared, p2m->default_access);
+    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_shared, 
+                       p2m->default_access);
     gfn_unlock(p2m, gfn, 0);
-    if ( 0 == rc )
+    if ( rc )
         gdprintk(XENLOG_ERR,
-            "set_shared_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
-            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
+            "set_shared_p2m_entry: set_p2m_entry failed! mfn=%08lx rc:%d\n",
+            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);
     return rc;
 }
 
@@ -1259,7 +1264,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, bool_t gla_valid, unsigned long gla,
     if ( access_w && p2ma == p2m_access_rx2rw ) 
     {
         rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2mt, p2m_access_rw);
-        ASSERT(rc);
+        ASSERT(rc == 0);
         gfn_unlock(p2m, gfn, 0);
         return 1;
     }
@@ -1268,7 +1273,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, bool_t gla_valid, unsigned long gla,
         ASSERT(access_w || access_r || access_x);
         rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
                             p2mt, p2m_access_rwx);
-        ASSERT(rc);
+        ASSERT(rc == 0);
     }
     gfn_unlock(p2m, gfn, 0);
 
@@ -1295,7 +1300,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, bool_t gla_valid, unsigned long gla,
                  * gfn locked and just did a successful get_entry(). */
                 rc = p2m->set_entry(p2m, gfn, mfn, PAGE_ORDER_4K,
                                     p2mt, p2m_access_rwx);
-                ASSERT(rc);
+                ASSERT(rc == 0);
             }
             gfn_unlock(p2m, gfn, 0);
             return 1;
@@ -1393,11 +1398,8 @@ long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
     for ( ; ; ++pfn )
     {
         mfn = p2m->get_entry(p2m, pfn, &t, &_a, 0, NULL);
-        if ( p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a) == 0 )
-        {
-            rc = -ENOMEM;
+        if ( (rc = p2m->set_entry(p2m, pfn, mfn, PAGE_ORDER_4K, t, a)) )
             break;
-        }
 
         /* Check for continuation if it's not the last interation. */
         if ( !--nr || hypercall_preempt_check() )
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-08  1:11         ` Mukesh Rathor
@ 2014-04-08  7:36           ` Jan Beulich
  2014-04-08 14:01             ` Tim Deegan
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-04-08  7:36 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 08.04.14 at 03:11, <mukesh.rathor@oracle.com> wrote:
> On Mon, 07 Apr 2014 07:57:56 +0100
> "Jan Beulich" <JBeulich@suse.com> wrote:
>> In the end I think we'll want to make them all report proper error
>> codes...
> 
> Ok, how about the following patch then? If it's OK, I'd like to submit
> independently.

Yes, I just now stumbled across this oddity too. Looks generally
like what we want - a couple of comments below.

> @@ -696,7 +695,7 @@ long arch_do_domctl(
>  
>              if ( paging_mode_translate(d) )
>                  for ( i = 0; i < nr_mfns; i++ )
> -                    add |= !clear_mmio_p2m_entry(d, gfn + i);
> +                    add |= !!clear_mmio_p2m_entry(d, gfn + i);

No need for the double !!, and in fact ...

>              ret = iomem_deny_access(d, mfn, mfn + nr_mfns - 1);
>              if ( !ret && add )
>                  ret = -EIO;

... you should be propagating the error code here anyway rather
than making one up.

> @@ -124,15 +124,16 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
>          gfn = (L2_gpa >> PAGE_SHIFT) & mask;
>          mfn = _mfn((L0_gpa >> PAGE_SHIFT) & mask);
>  
> -        rv = set_p2m_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
> +        rc = set_p2m_entry(p2m, gfn, mfn, page_order, p2mt, p2ma);
>      }
>  
>      p2m_unlock(p2m);
>  
> -    if (rv == 0) {
> +    if ( rc )
> +    {
>          gdprintk(XENLOG_ERR,
> -		"failed to set entry for %#"PRIx64" -> %#"PRIx64"\n",
> -		L2_gpa, L0_gpa);
> +		"failed to set entry for %#"PRIx64" -> %#"PRIx64" rc:%d\n",
> +		L2_gpa, L0_gpa, rc);

Together with the other coding style cleanup you should also get rid
of the hard tabs here.

> @@ -769,26 +769,28 @@ set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>      }
>  
>      P2M_DEBUG("set mmio %lx %lx\n", gfn, mfn_x(mfn));
> -    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct, p2m->default_access);
> +    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_mmio_direct,
> +                       p2m->default_access);
>      gfn_unlock(p2m, gfn, 0);
> -    if ( 0 == rc )
> +    if ( rc )
>          gdprintk(XENLOG_ERR,
> -            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
> -            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
> +            "set_mmio_p2m_entry: set_p2m_entry failed! mfn=%08lx rc:%d\n",
> +            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);

Again, if you already alter this and do minor cleanup at once, this
should become

	printk(XENLOG_G_ERR "d%d: ...", ...);

or the "set_mmio_p2m_entry:" prefix should be dropped.

> @@ -835,12 +839,13 @@ set_shared_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
>          set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
>  
>      P2M_DEBUG("set shared %lx %lx\n", gfn, mfn_x(mfn));
> -    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_shared, p2m->default_access);
> +    rc = set_p2m_entry(p2m, gfn, mfn, PAGE_ORDER_4K, p2m_ram_shared, 
> +                       p2m->default_access);
>      gfn_unlock(p2m, gfn, 0);
> -    if ( 0 == rc )
> +    if ( rc )
>          gdprintk(XENLOG_ERR,
> -            "set_shared_p2m_entry: set_p2m_entry failed! mfn=%08lx\n",
> -            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)));
> +            "set_shared_p2m_entry: set_p2m_entry failed! mfn=%08lx rc:%d\n",
> +            mfn_x(get_gfn_query_unlocked(p2m->domain, gfn, &ot)), rc);

Similarly here.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range
  2014-04-08  1:00               ` Mukesh Rathor
@ 2014-04-08  8:21                 ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2014-04-08  8:21 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 08.04.14 at 03:00, <mukesh.rathor@oracle.com> wrote:
> On Mon, 7 Apr 2014 10:28:50 +0100
> George Dunlap <george.dunlap@eu.citrix.com> wrote:
> 
>> On 04/07/2014 07:59 AM, Jan Beulich wrote:
>> >>>> On 05.04.14 at 03:00, <mukesh.rathor@oracle.com> wrote:
>> >> On Tue, 01 Apr 2014 16:09:15 +0100
>> >> "Jan Beulich" <JBeulich@suse.com> wrote:
>> >>
>> >>>>>> On 01.04.14 at 16:40, <george.dunlap@eu.citrix.com> wrote:
>> >>>> On 03/24/2014 09:34 AM, Jan Beulich wrote:
>> >>>>>>>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> >>>>>> --- a/xen/arch/x86/hvm/vioapic.c
>> >>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>> >>>>>> @@ -238,8 +238,13 @@ static int vioapic_write(
>> >>>>>>
>> >>>>>>    static int vioapic_range(struct vcpu *v, unsigned long addr)
>> >>>>>>    {
>> >>>>>> -    struct hvm_hw_vioapic *vioapic =
>> >>>>>> domain_vioapic(v->domain);
>> >>>>>> +    struct hvm_hw_vioapic *vioapic;
>> >>>>>> +
>> >>>>>> +    /* pvh uses event channel callback */
>> >>>>>> +    if ( is_pvh_vcpu(v) )
>> >>>>>> +        return 0;
>> >>>>>>
>> >>>>>> +    vioapic = domain_vioapic(v->domain);
>> >>>>> I can see why the extra check is needed, but I can't see why you
>> >>>>> convert the initializer to an assignment: Afaict
>> >>>>> domain_vioapic() is safe even if d->arch.hvm_domain.vioapic ==
>> >>>>> NULL.
>> >>>> Or better yet, just make it something like:
>> >>>>
>> >>>> return vioapic && ((addr >= [...original range check]))
>> >>>>
>> >>>> That way we don't have to have a PVH-specific hook at all.  If a
>> >>>> domain doesn't have a vioapic for any reason, return 0.
>> >>> No, vioapic isn't going to be NULL for PVH:
>> >>>
>> >>> #define domain_vioapic(d)
>> >>> (&(d)->arch.hvm_domain.vioapic->hvm_hw_vioapic)
>> >> No, viopaic is NULL for PVH, hence the patch. So, can prob just
>> >> check for the ptr like George suggests and remove the pvh check.
>> > Sorry, no - I agree that d->arch.hvm_domain.vioapic is NULL for
>> > PVH, but that doesn't mean the result of domain_vioapic(d) is too
>> > (see the quoted #define above).
>> 
>> I interpreted Mukesh saying "check for the ptr" as, "check 
>> d->arch_hvm.domain.vioapic", not "check domain_vioapic(d)".
> 
> Correct. Since, vioapic is NULL for PVH, domain_vioapic(d) will
> result in NULL ptr exception for PVH,

Why/how? The macro doesn't deref that pointer, it only takes the
address of a field within the structure (which will result in a pointer
to or slightly above NULL).

Jan

> so we can't call that macro for PVH.
> 
> thanks
> Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-08  7:36           ` Jan Beulich
@ 2014-04-08 14:01             ` Tim Deegan
  2014-04-08 14:07               ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Tim Deegan @ 2014-04-08 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George.Dunlap, eddie.dong, keir.xen, jun.nakajima, xen-devel

At 08:36 +0100 on 08 Apr (1396942615), Jan Beulich wrote:
> >>> On 08.04.14 at 03:11, <mukesh.rathor@oracle.com> wrote:
> > On Mon, 07 Apr 2014 07:57:56 +0100
> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> In the end I think we'll want to make them all report proper error
> >> codes...
> > 
> > Ok, how about the following patch then? If it's OK, I'd like to submit
> > independently.
> 
> Yes, I just now stumbled across this oddity too. Looks generally
> like what we want

While in general I like this, I'm worried about inverting the sense of
the return code -- that seems like setting a trap for ourselves with
backporting patches past this one.

I'd be inclined to change the name of the function as a safety catch,
though I can't immediately think of a good new name for set_p2m_entry.
Maybe just truncate it to set_p2m()?

Tim.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-08 14:01             ` Tim Deegan
@ 2014-04-08 14:07               ` Jan Beulich
  2014-04-08 14:18                 ` Tim Deegan
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2014-04-08 14:07 UTC (permalink / raw)
  To: Tim Deegan; +Cc: George.Dunlap, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 08.04.14 at 16:01, <tim@xen.org> wrote:
> At 08:36 +0100 on 08 Apr (1396942615), Jan Beulich wrote:
>> >>> On 08.04.14 at 03:11, <mukesh.rathor@oracle.com> wrote:
>> > On Mon, 07 Apr 2014 07:57:56 +0100
>> > "Jan Beulich" <JBeulich@suse.com> wrote:
>> >> In the end I think we'll want to make them all report proper error
>> >> codes...
>> > 
>> > Ok, how about the following patch then? If it's OK, I'd like to submit
>> > independently.
>> 
>> Yes, I just now stumbled across this oddity too. Looks generally
>> like what we want
> 
> While in general I like this, I'm worried about inverting the sense of
> the return code -- that seems like setting a trap for ourselves with
> backporting patches past this one.
> 
> I'd be inclined to change the name of the function as a safety catch,
> though I can't immediately think of a good new name for set_p2m_entry.
> Maybe just truncate it to set_p2m()?

That could mean a different thing. How about p2m_set_entry(),
at once matching most other P2M functions using p2m_ as their
prefix.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-08 14:07               ` Jan Beulich
@ 2014-04-08 14:18                 ` Tim Deegan
  2014-04-08 15:40                   ` George Dunlap
  0 siblings, 1 reply; 51+ messages in thread
From: Tim Deegan @ 2014-04-08 14:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George.Dunlap, eddie.dong, keir.xen, jun.nakajima, xen-devel

At 15:07 +0100 on 08 Apr (1396966037), Jan Beulich wrote:
> >>> On 08.04.14 at 16:01, <tim@xen.org> wrote:
> > At 08:36 +0100 on 08 Apr (1396942615), Jan Beulich wrote:
> >> >>> On 08.04.14 at 03:11, <mukesh.rathor@oracle.com> wrote:
> >> > On Mon, 07 Apr 2014 07:57:56 +0100
> >> > "Jan Beulich" <JBeulich@suse.com> wrote:
> >> >> In the end I think we'll want to make them all report proper error
> >> >> codes...
> >> > 
> >> > Ok, how about the following patch then? If it's OK, I'd like to submit
> >> > independently.
> >> 
> >> Yes, I just now stumbled across this oddity too. Looks generally
> >> like what we want
> > 
> > While in general I like this, I'm worried about inverting the sense of
> > the return code -- that seems like setting a trap for ourselves with
> > backporting patches past this one.
> > 
> > I'd be inclined to change the name of the function as a safety catch,
> > though I can't immediately think of a good new name for set_p2m_entry.
> > Maybe just truncate it to set_p2m()?
> 
> That could mean a different thing. How about p2m_set_entry(),
> at once matching most other P2M functions using p2m_ as their
> prefix.

Sure.  I think the current implementation in p2m_pt might be using
that name, but it could be given a more meaningful name at the same
time.

Tim.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-08 14:18                 ` Tim Deegan
@ 2014-04-08 15:40                   ` George Dunlap
  0 siblings, 0 replies; 51+ messages in thread
From: George Dunlap @ 2014-04-08 15:40 UTC (permalink / raw)
  To: Tim Deegan, Jan Beulich; +Cc: xen-devel, keir.xen, eddie.dong, jun.nakajima

On 04/08/2014 03:18 PM, Tim Deegan wrote:
> At 15:07 +0100 on 08 Apr (1396966037), Jan Beulich wrote:
>>>>> On 08.04.14 at 16:01, <tim@xen.org> wrote:
>>> At 08:36 +0100 on 08 Apr (1396942615), Jan Beulich wrote:
>>>>>>> On 08.04.14 at 03:11, <mukesh.rathor@oracle.com> wrote:
>>>>> On Mon, 07 Apr 2014 07:57:56 +0100
>>>>> "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>>> In the end I think we'll want to make them all report proper error
>>>>>> codes...
>>>>>
>>>>> Ok, how about the following patch then? If it's OK, I'd like to submit
>>>>> independently.
>>>>
>>>> Yes, I just now stumbled across this oddity too. Looks generally
>>>> like what we want
>>>
>>> While in general I like this, I'm worried about inverting the sense of
>>> the return code -- that seems like setting a trap for ourselves with
>>> backporting patches past this one.
>>>
>>> I'd be inclined to change the name of the function as a safety catch,
>>> though I can't immediately think of a good new name for set_p2m_entry.
>>> Maybe just truncate it to set_p2m()?
>>
>> That could mean a different thing. How about p2m_set_entry(),
>> at once matching most other P2M functions using p2m_ as their
>> prefix.
>
> Sure.  I think the current implementation in p2m_pt might be using
> that name, but it could be given a more meaningful name at the same
> time.

Ack to the rename.

  -George

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-03-24  9:26   ` Jan Beulich
  2014-04-05  1:17     ` Mukesh Rathor
@ 2014-04-11  1:33     ` Mukesh Rathor
  2014-04-11  8:02       ` Jan Beulich
  1 sibling, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2014-04-11  1:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

On Mon, 24 Mar 2014 09:26:58 +0000
"Jan Beulich" <JBeulich@suse.com> wrote:

> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
> > +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
> > +                                          const ept_entry_t *new)
> > +{
> > +    unsigned long oldmfn = 0;
> > +
> > +    if ( p2m_is_foreign(new->sa_p2mt) )
> > +    {
> > +        struct page_info *page;
> > +        struct domain *fdom;
> > +
> > +        ASSERT(mfn_valid(new->mfn));
> > +        page = mfn_to_page(new->mfn);
> > +        fdom = page_get_owner(page);
> > +        get_page(page, fdom);
> 
> I'm afraid the checking here is too weak, or at least inconsistent
> (considering the ASSERT()): mfn_valid() isn't a sufficient condition
> for page_get_owner() to return non-NULL or get_page() to
> succeed. If all callers are supposed to guarantee this, then further
> ASSERT()s should be added here. If not, error checking is going to

Correct, callers pretty much guarantee that. There are stringent checks
done in p2m_add_foreign. How about:

        ASSERT(mfn_valid(new->mfn));
        page = mfn_to_page(new->mfn);
        fdom = page_get_owner(page);
        ASSERT(fdom);
        rc = get_page(page, fdom);
        ASSERT(rc == 0);


> be necessary (and possibly the ASSERT() then also needs to be
> converted to an error check).
> 
> I also wonder whether page_get_owner_and_reference() wouldn't
> be more appropriate to be used here.

Could.  get_page() does an extra check (owner == domain) for free. But
either way. Tim, preference?


> > @@ -275,14 +276,19 @@ struct page_info *get_page_from_gfn_p2m(
> >          /* Fast path: look up and get out */
> >          p2m_read_lock(p2m);
> >          mfn = __get_gfn_type_access(p2m, gfn, t, a, 0, NULL, 0);
> > -        if ( (p2m_is_ram(*t) || p2m_is_grant(*t))
> > +        if ( (p2m_is_ram(*t) || p2m_is_grant(*t) ||
> > p2m_is_foreign(*t) )
> 
> Would this perhaps better become something like p2m_is_any_ram()?

yes. p2m_remove_page has similar (inverted) check. will do.


> (I'm in fact surprised this is only needed in exactly one place.)

Looks like it could be added to set_mmio_p2m_entry to make sure 
mmio is not mapped at foreign mapping, and perhaps guest_physmap_add_entry.
Other places, shadow and p2m-pt, has no support at present.

> 
> > +int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
> > +                    unsigned long gpfn, domid_t fdid)
> > +{
> > +    int rc = -ESRCH;
> > +    p2m_type_t p2mt, p2mt_prev;
> > +    unsigned long prev_mfn, mfn;
> > +    struct page_info *page = NULL;
> > +    struct domain *fdom = NULL;
> > +
> > +    /* xentrace mapping pages from xen heap are owned by DOMID_XEN
> > */
> > +    if ( (fdid == DOMID_XEN && (fdom = rcu_lock_domain(dom_xen))
> > == NULL) ||
> > +         (fdid != DOMID_XEN && (fdom =
> > rcu_lock_domain_by_id(fdid)) == NULL) )
> > +        goto out;
> 
> Didn't you mean to call get_pg_owner() here, at once taking care of
> DOMID_IO too? Also I don't think the reference to xentrace is really
> useful here - DOMID_XEN owned pages certainly exist elsewhere too.

IIRC, I think I didn't because it will let you get away with DOMID_SELF.
I could just check for it before calling get_pg_owner:

if ( fdid == DOMID_SELF )
    goto out with -EINVAL; 
else if ( (fdom = get_pg_owner(fdid)) == NULL )
    goto out with -ESRCH; 
..

what do you think?

> Of course the question is whether for the purposes you have here
> DOMID_XEN/DOMID_IO owned pages are actually validly making it
> here.

Yes. xentrace running on pvh dom0 wants to map xen heap pages. You
mentioned there are possibly other users. I'm not familiar with
DOMID_IO use case at this point.

thanks for your time.
Mukesh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages
  2014-04-11  1:33     ` Mukesh Rathor
@ 2014-04-11  8:02       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2014-04-11  8:02 UTC (permalink / raw)
  To: Mukesh Rathor
  Cc: George.Dunlap, tim, eddie.dong, keir.xen, jun.nakajima, xen-devel

>>> On 11.04.14 at 03:33, <mukesh.rathor@oracle.com> wrote:
> On Mon, 24 Mar 2014 09:26:58 +0000
> "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>> >>> On 22.03.14 at 02:39, <mukesh.rathor@oracle.com> wrote:
>> > +static inline void atomic_write_ept_entry(ept_entry_t *entryptr,
>> > +                                          const ept_entry_t *new)
>> > +{
>> > +    unsigned long oldmfn = 0;
>> > +
>> > +    if ( p2m_is_foreign(new->sa_p2mt) )
>> > +    {
>> > +        struct page_info *page;
>> > +        struct domain *fdom;
>> > +
>> > +        ASSERT(mfn_valid(new->mfn));
>> > +        page = mfn_to_page(new->mfn);
>> > +        fdom = page_get_owner(page);
>> > +        get_page(page, fdom);
>> 
>> I'm afraid the checking here is too weak, or at least inconsistent
>> (considering the ASSERT()): mfn_valid() isn't a sufficient condition
>> for page_get_owner() to return non-NULL or get_page() to
>> succeed. If all callers are supposed to guarantee this, then further
>> ASSERT()s should be added here. If not, error checking is going to
> 
> Correct, callers pretty much guarantee that. There are stringent checks
> done in p2m_add_foreign. How about:
> 
>         ASSERT(mfn_valid(new->mfn));
>         page = mfn_to_page(new->mfn);
>         fdom = page_get_owner(page);
>         ASSERT(fdom);
>         rc = get_page(page, fdom);
>         ASSERT(rc == 0);

Yes, minimally that (so we'll at least know if some of the assumptions
are wrong).

>> > +int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
>> > +                    unsigned long gpfn, domid_t fdid)
>> > +{
>> > +    int rc = -ESRCH;
>> > +    p2m_type_t p2mt, p2mt_prev;
>> > +    unsigned long prev_mfn, mfn;
>> > +    struct page_info *page = NULL;
>> > +    struct domain *fdom = NULL;
>> > +
>> > +    /* xentrace mapping pages from xen heap are owned by DOMID_XEN
>> > */
>> > +    if ( (fdid == DOMID_XEN && (fdom = rcu_lock_domain(dom_xen))
>> > == NULL) ||
>> > +         (fdid != DOMID_XEN && (fdom =
>> > rcu_lock_domain_by_id(fdid)) == NULL) )
>> > +        goto out;
>> 
>> Didn't you mean to call get_pg_owner() here, at once taking care of
>> DOMID_IO too? Also I don't think the reference to xentrace is really
>> useful here - DOMID_XEN owned pages certainly exist elsewhere too.
> 
> IIRC, I think I didn't because it will let you get away with DOMID_SELF.
> I could just check for it before calling get_pg_owner:
> 
> if ( fdid == DOMID_SELF )
>     goto out with -EINVAL; 
> else if ( (fdom = get_pg_owner(fdid)) == NULL )
>     goto out with -ESRCH; 
> ..
> 
> what do you think?

Much more readable than the original imo.

Jan

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2014-04-11  8:16 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-22  1:39 [V8 PATCH 0/8] pvh dom0 Mukesh Rathor
2014-03-22  1:39 ` [V8 PATCH 1/8] pvh dom0: move some pv specific code to static functions Mukesh Rathor
2014-03-22  1:39 ` [V8 PATCH 2/8] pvh dom0: construct_dom0 changes Mukesh Rathor
2014-03-26 19:05   ` George Dunlap
2014-03-27 10:14     ` Jan Beulich
2014-03-27 10:55       ` George Dunlap
2014-03-27 11:03         ` George Dunlap
2014-03-27 15:04         ` Jan Beulich
2014-03-27 15:30           ` Tim Deegan
2014-04-05  0:53             ` Mukesh Rathor
2014-04-07  7:30               ` Jan Beulich
2014-04-07  9:27               ` George Dunlap
2014-03-22  1:39 ` [V8 PATCH 3/8] pvh dom0: Introduce p2m_map_foreign Mukesh Rathor
2014-03-24  9:00   ` Jan Beulich
2014-03-27 12:29   ` George Dunlap
2014-04-05  0:57     ` Mukesh Rathor
2014-03-22  1:39 ` [V8 PATCH 4/8] pvh dom0: make xsm_map_gmfn_foreign available for x86 Mukesh Rathor
2014-03-25 17:53   ` Daniel De Graaf
2014-03-22  1:39 ` [V8 PATCH 5/8] pvh dom0: Add and remove foreign pages Mukesh Rathor
2014-03-24  9:26   ` Jan Beulich
2014-04-05  1:17     ` Mukesh Rathor
2014-04-07  6:57       ` Jan Beulich
2014-04-08  1:11         ` Mukesh Rathor
2014-04-08  7:36           ` Jan Beulich
2014-04-08 14:01             ` Tim Deegan
2014-04-08 14:07               ` Jan Beulich
2014-04-08 14:18                 ` Tim Deegan
2014-04-08 15:40                   ` George Dunlap
2014-04-11  1:33     ` Mukesh Rathor
2014-04-11  8:02       ` Jan Beulich
2014-03-22  1:39 ` [V8 PATCH 6/8] pvh dom0: allow get_pg_owner for translated domains Mukesh Rathor
2014-03-24  9:31   ` Jan Beulich
2014-04-01 14:31     ` George Dunlap
2014-04-05  0:59       ` Mukesh Rathor
2014-03-22  1:39 ` [V8 PATCH 7/8] pvh dom0: add check for pvh in vioapic_range Mukesh Rathor
2014-03-24  9:34   ` Jan Beulich
2014-04-01 14:40     ` George Dunlap
2014-04-01 15:09       ` Jan Beulich
2014-04-05  1:00         ` Mukesh Rathor
2014-04-07  6:59           ` Jan Beulich
2014-04-07  9:28             ` George Dunlap
2014-04-08  1:00               ` Mukesh Rathor
2014-04-08  8:21                 ` Jan Beulich
2014-03-22  1:39 ` [V8 PATCH 8/8] pvh dom0: add opt_dom0pvh to setup.c Mukesh Rathor
2014-03-24  9:35   ` Jan Beulich
2014-03-24  8:57 ` [V8 PATCH 0/8] pvh dom0 Jan Beulich
2014-03-24 21:36   ` Mukesh Rathor
2014-03-28 17:36 ` Roger Pau Monné
2014-03-28 19:48   ` Mukesh Rathor
2014-04-01 16:04 ` George Dunlap
2014-04-02  1:22   ` Mukesh Rathor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.