All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes
@ 2019-01-30 10:36 Roger Pau Monne
  2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
                   ` (7 more replies)
  0 siblings, 8 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne

Hello,

The following series contains fixes that should be considered for 4.12.

I'm not sure whether patches 6, 7 and 8 should be aimed at 4.12, they
contain changes to the p2m code that could affect HVM guests. Note that
without those changes a PVH dom0 running on AMD hardware will be unable
to create guests. Overall the patches are a nice cleanup to the handling
of p2m_ioreq_server and p2m_map_foreign types.

The series can also be found at:

git://xenbits.xen.org/people/royger/xen.git fixes-4.12

Thanks, Roger.

Roger Pau Monne (8):
  dom0/pvh: align allocation and mapping order to start address
  amd/ntp: remove assert that prevents creating 2M MMIO entries
  iommu/pvh: add reserved regions below 1MB to the iommu page tables
  x86/shadow: alloc enough pages so initialization doesn't fail
  pvh/dom0: warn when dom0_mem is not set to a fixed value
  x86/mm: split p2m ioreq server pages special handling into helper
  x86/mm: handle foreign mappings in p2m_entry_modify
  npt/shadow: allow getting foreign page table entries

 xen/arch/x86/hvm/dom0_build.c       |   2 +
 xen/arch/x86/mm/hap/hap.c           |   4 +
 xen/arch/x86/mm/p2m-ept.c           | 137 ++++++----------------------
 xen/arch/x86/mm/p2m-pt.c            |  31 +------
 xen/arch/x86/mm/shadow/common.c     |  13 ++-
 xen/drivers/passthrough/x86/iommu.c |  14 +--
 xen/include/asm-x86/p2m.h           |  58 ++++++++++++
 7 files changed, 110 insertions(+), 149 deletions(-)

-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-01-30 12:37   ` Wei Liu
  2019-02-04 16:41   ` Jan Beulich
  2019-01-30 10:36 ` [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries Roger Pau Monne
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Due to the recent changes in the iommu mapping logic, the start
addresses provided need to be aligned to the order intended to be
mapped.

dom0 PVH domain builder didn't take this into account when populating
the p2m, fix this by making sure the order is chosen so that the start
address is aligned to it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
Without this patch trying to create a PVH dom0 will trigger an assert
on certain hardware depending on the memory map.
---
 xen/arch/x86/hvm/dom0_build.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 51cf490811..a571d15c13 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -152,6 +152,8 @@ static int __init pvh_populate_memory_range(struct domain *d,
 
         order = get_order_from_pages(end - start + 1);
         order = min(order ? order - 1 : 0, max_order);
+        /* The order allocated and populated must be aligned to the address. */
+        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
         page = alloc_domheap_pages(d, order, dom0_memflags | MEMF_no_scrub);
         if ( page == NULL )
         {
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
  2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-02-04 16:48   ` Andrew Cooper
  2019-02-04 16:56   ` Jan Beulich
  2019-01-30 10:36 ` [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables Roger Pau Monne
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	George Dunlap, Jan Beulich, Roger Pau Monne

The assert was originally added to make sure that higher order
regions (> PAGE_ORDER_4K) could not be used to bypass the
mmio_ro_ranges check performed by p2m_type_to_flags.

This however is already checked in set_mmio_p2m_entry, which makes
sure that higher order mappings don't overlap with mmio_ro_ranges,
thus allowing the creation of high order MMIO mappings safely.

Remove the assert to allow 2M entries to be created for MMIO regions
that don't overlap with mmio_ro_ranges.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Without this patch trying to create a PVH dom0 will trigger an assert
on certain hardware depending on the memory map.
---
 xen/arch/x86/mm/p2m-pt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 12f92cf1f0..b8996e5415 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         }
 
         ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
-        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
             ? p2m_l2e_from_pfn(mfn_x(mfn),
                                p2m_type_to_flags(p2m, p2mt, mfn, 1))
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
  2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
  2019-01-30 10:36 ` [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-02-05 10:47   ` Jan Beulich
  2019-01-30 10:36 ` [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail Roger Pau Monne
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Jan Beulich, Roger Pau Monne

Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
page tables, but due to this being done before setting up the IOMMU
the non RAM regions in those areas are not added to the IOMMU page
tables. Fix this by making sure any reserved regions below 1MB are
added to the IOMMU page tables.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Juergen Gross <jgross@suse.com>
---
Without this patch creating a PVH dom0 can cause iommu page faults and
non-functional devices.
---
 xen/drivers/passthrough/x86/iommu.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index a88ef9b189..d3639d1538 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
      * inclusive mapping additionally maps in every pfn up to 4GB except those
      * that fall in unusable ranges for PV Dom0.
      */
-    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
-         /*
-          * Ignore any address below 1MB, that's already identity mapped by the
-          * Dom0 builder for HVM.
-          */
-         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
+    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
         return false;
 
     switch ( type = page_get_ram_type(mfn) )
@@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
         if ( !hwdom_iommu_map(d, pfn, max_pfn) )
             continue;
 
-        if ( paging_mode_translate(d) )
+        /*
+         * Don't add any address below 1MB to the HAP page tables, that's
+         * already done by the domain builder. Add addresses below 1MB to the
+         * IOMMU page tables only.
+         */
+        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
             rc = set_identity_p2m_entry(d, pfn, p2m_access_rw, 0);
         else
             rc = iommu_map(d, _dfn(pfn), _mfn(pfn), PAGE_ORDER_4K,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
                   ` (2 preceding siblings ...)
  2019-01-30 10:36 ` [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-02-05 11:21   ` Jan Beulich
  2019-01-30 10:36 ` [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value Roger Pau Monne
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich,
	Roger Pau Monne

Current code in shadow_enable will allocate a shadow pool of 4MB
regardless of the values of sh_min_allocation or
shadow_min_acceptable_pages, which means that calls to
shadow_alloc_p2m_page can fail even after the check and allocation
done just above.

Fix this by always checking that the pool is big enough so the rest of
the shadow_init function cannot fail due to lack of pages in the
shadow pool. This is relevant to shadow_alloc_p2m_page which requires
a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.

This allows booting a guest using shadow and more than 6 vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Tim Deegan <tim@xen.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
I think this should be considered for 4.12, or else attempting to
create a shadow guest with more than 6 vCPUs fails.
---
 xen/arch/x86/mm/shadow/common.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index c49aeb5e60..78525ddd23 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
     uint32_t *e;
     int rv = 0;
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    /*
+     * Required minimum amount of pool pages plus 4MB. This is required so the
+     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
+     */
+    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
 
     mode |= PG_SH_enable;
 
@@ -2719,10 +2724,10 @@ int shadow_enable(struct domain *d, u32 mode)
 
     /* Init the shadow memory allocation if the user hasn't done so */
     old_pages = d->arch.paging.shadow.total_pages;
-    if ( old_pages < sh_min_allocation(d) + d->arch.paging.shadow.p2m_pages )
+    if ( old_pages < min_pages )
     {
         paging_lock(d);
-        rv = shadow_set_allocation(d, 1024, NULL); /* Use at least 4MB */
+        rv = shadow_set_allocation(d, min_pages, NULL);
         if ( rv != 0 )
         {
             shadow_set_allocation(d, 0, NULL);
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
                   ` (3 preceding siblings ...)
  2019-01-30 10:36 ` [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-02-06 13:54   ` Jan Beulich
  2019-02-07 17:09   ` George Dunlap
  2019-01-30 10:36 ` [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper Roger Pau Monne
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

There have been several reports of the dom0 builder running out of
memory when buildign a PVH dom0 without havingf specified a dom0_mem
value. Print a warning message if dom0_mem is not set to a fixed value
when booting in PVH mode.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
---
Without this patch creating a PVH dom0 without a dom0_mem parameter
can result in the dom0 builder running out of memory thus leading to a
Xen crash. The added message gives a hit to the user about a possible
fix.
---
 xen/arch/x86/dom0_build.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 2b4d9e9ea6..427a665ddc 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -344,6 +344,10 @@ unsigned long __init dom0_compute_nr_pages(
     if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
         parse_dom0_mem(CONFIG_DOM0_MEM);
 
+    if ( is_hvm_domain(d) && !dom0_size.nr_pages )
+        printk(
+"WARNING: consider setting dom0_mem to a fixed value when using PVH mode\n");
+
     for_each_node_mask ( node, dom0_nodes )
         avail += avail_domheap_pages_region(node, 0, 0) +
                  initial_images_nrpages(node);
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
                   ` (4 preceding siblings ...)
  2019-01-30 10:36 ` [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-01-31 14:59   ` Paul Durrant
  2019-01-30 10:36 ` [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify Roger Pau Monne
  2019-01-30 10:36 ` [PATCH 8/8] npt/shadow: allow getting foreign page table entries Roger Pau Monne
  7 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Paul Durrant, Jun Nakajima, Roger Pau Monne

So that it can be shared by both ept, npt and shadow code, instead of
duplicating it.

No change in functionality intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
 xen/arch/x86/mm/hap/hap.c       |  3 ++
 xen/arch/x86/mm/p2m-ept.c       | 55 +++++++++++----------------------
 xen/arch/x86/mm/p2m-pt.c        | 20 ------------
 xen/arch/x86/mm/shadow/common.c |  3 ++
 xen/include/asm-x86/p2m.h       | 32 +++++++++++++++++++
 5 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d651b94c3..dc46d5e14f 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -734,6 +734,9 @@ hap_write_p2m_entry(struct domain *d, unsigned long gfn, l1_pgentry_t *p,
             && perms_strictly_increased(old_flags, l1e_get_flags(new)) );
     }
 
+    p2m_entry_modify(p2m_get_hostp2m(d), p2m_flags_to_type(l1e_get_flags(new)),
+                     p2m_flags_to_type(old_flags), level);
+
     safe_write_pte(p, new);
     if ( old_flags & _PAGE_PRESENT )
         flush_tlb_mask(d->dirty_cpumask);
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index bb562607f7..0ece6608cb 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -46,7 +46,8 @@ static inline bool_t is_epte_valid(ept_entry_t *e)
 }
 
 /* returns : 0 for success, -errno otherwise */
-static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new,
+static int atomic_write_ept_entry(struct p2m_domain *p2m,
+                                  ept_entry_t *entryptr, ept_entry_t new,
                                   int level)
 {
     int rc;
@@ -89,6 +90,8 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new,
     if ( unlikely(p2m_is_foreign(entryptr->sa_p2mt)) && check_foreign )
         oldmfn = entryptr->mfn;
 
+    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, level);
+
     write_atomic(&entryptr->epte, new.epte);
 
     if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) )
@@ -390,7 +393,8 @@ static int ept_next_level(struct p2m_domain *p2m, bool_t read_only,
  * present entries in the given page table, optionally marking the entries
  * also for their subtrees needing P2M type re-calculation.
  */
-static bool_t ept_invalidate_emt(mfn_t mfn, bool_t recalc, int level)
+static bool_t ept_invalidate_emt(struct p2m_domain *p2m, mfn_t mfn,
+                                 bool_t recalc, int level)
 {
     int rc;
     ept_entry_t *epte = map_domain_page(mfn);
@@ -408,7 +412,7 @@ static bool_t ept_invalidate_emt(mfn_t mfn, bool_t recalc, int level)
         e.emt = MTRR_NUM_TYPES;
         if ( recalc )
             e.recalc = 1;
-        rc = atomic_write_ept_entry(&epte[i], e, level);
+        rc = atomic_write_ept_entry(p2m, &epte[i], e, level);
         ASSERT(rc == 0);
         changed = 1;
     }
@@ -459,7 +463,7 @@ static int ept_invalidate_emt_range(struct p2m_domain *p2m,
             rc = -ENOMEM;
             goto out;
         }
-        wrc = atomic_write_ept_entry(&table[index], split_ept_entry, i);
+        wrc = atomic_write_ept_entry(p2m, &table[index], split_ept_entry, i);
         ASSERT(wrc == 0);
 
         for ( ; i > target; --i )
@@ -479,7 +483,7 @@ static int ept_invalidate_emt_range(struct p2m_domain *p2m,
         {
             e.emt = MTRR_NUM_TYPES;
             e.recalc = 1;
-            wrc = atomic_write_ept_entry(&table[index], e, target);
+            wrc = atomic_write_ept_entry(p2m, &table[index], e, target);
             ASSERT(wrc == 0);
             rc = 1;
         }
@@ -549,17 +553,11 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                     nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn + i);
                     if ( nt != e.sa_p2mt )
                     {
-                        if ( e.sa_p2mt == p2m_ioreq_server )
-                        {
-                            ASSERT(p2m->ioreq.entry_count > 0);
-                            p2m->ioreq.entry_count--;
-                        }
-
                         e.sa_p2mt = nt;
                         ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
                     }
                     e.recalc = 0;
-                    wrc = atomic_write_ept_entry(&epte[i], e, level);
+                    wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
                     ASSERT(wrc == 0);
                 }
             }
@@ -595,7 +593,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                 {
                     if ( ept_split_super_page(p2m, &e, level, level - 1) )
                     {
-                        wrc = atomic_write_ept_entry(&epte[i], e, level);
+                        wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
                         ASSERT(wrc == 0);
                         unmap_domain_page(epte);
                         mfn = e.mfn;
@@ -610,7 +608,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                 e.recalc = 0;
                 if ( recalc && p2m_is_changeable(e.sa_p2mt) )
                     ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
-                wrc = atomic_write_ept_entry(&epte[i], e, level);
+                wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
                 ASSERT(wrc == 0);
             }
 
@@ -621,11 +619,11 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
         if ( e.emt == MTRR_NUM_TYPES )
         {
             ASSERT(is_epte_present(&e));
-            ept_invalidate_emt(_mfn(e.mfn), e.recalc, level);
+            ept_invalidate_emt(p2m, _mfn(e.mfn), e.recalc, level);
             smp_wmb();
             e.emt = 0;
             e.recalc = 0;
-            wrc = atomic_write_ept_entry(&epte[i], e, level);
+            wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
             ASSERT(wrc == 0);
             unmap_domain_page(epte);
             rc = 1;
@@ -786,7 +784,7 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
         /* now install the newly split ept sub-tree */
         /* NB: please make sure domian is paused and no in-fly VT-d DMA. */
-        rc = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
+        rc = atomic_write_ept_entry(p2m, ept_entry, split_ept_entry, i);
         ASSERT(rc == 0);
 
         /* then move to the level we want to make real changes */
@@ -833,24 +831,7 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;
 
-    /*
-     * p2m_ioreq_server is only used for 4K pages, so the
-     * count is only done on ept page table entries.
-     */
-    if ( p2mt == p2m_ioreq_server )
-    {
-        ASSERT(i == 0);
-        p2m->ioreq.entry_count++;
-    }
-
-    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
-    {
-        ASSERT(i == 0);
-        ASSERT(p2m->ioreq.entry_count > 0);
-        p2m->ioreq.entry_count--;
-    }
-
-    rc = atomic_write_ept_entry(ept_entry, new_entry, target);
+    rc = atomic_write_ept_entry(p2m, ept_entry, new_entry, target);
     if ( unlikely(rc) )
         old_entry.epte = 0;
     else
@@ -1070,7 +1051,7 @@ static void ept_change_entry_type_global(struct p2m_domain *p2m,
     if ( !mfn )
         return;
 
-    if ( ept_invalidate_emt(_mfn(mfn), 1, p2m->ept.wl) )
+    if ( ept_invalidate_emt(p2m, _mfn(mfn), 1, p2m->ept.wl) )
         ept_sync_domain(p2m);
 }
 
@@ -1128,7 +1109,7 @@ static void ept_memory_type_changed(struct p2m_domain *p2m)
     if ( !mfn )
         return;
 
-    if ( ept_invalidate_emt(_mfn(mfn), 0, p2m->ept.wl) )
+    if ( ept_invalidate_emt(p2m, _mfn(mfn), 0, p2m->ept.wl) )
         ept_sync_domain(p2m);
 }
 
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index b8996e5415..c8905a5597 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -436,13 +436,6 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                 flags |= _PAGE_PSE;
             }
 
-            if ( ot == p2m_ioreq_server )
-            {
-                ASSERT(p2m->ioreq.entry_count > 0);
-                ASSERT(level == 0);
-                p2m->ioreq.entry_count--;
-            }
-
             e = l1e_from_pfn(mfn, flags);
             p2m_add_iommu_flags(&e, level,
                                 (nt == p2m_ram_rw)
@@ -627,19 +620,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
         p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
 
-        /*
-         * p2m_ioreq_server is only used for 4K pages, so
-         * the count is only done for level 1 entries.
-         */
-        if ( p2mt == p2m_ioreq_server )
-            p2m->ioreq.entry_count++;
-
-        if ( p2mt_old == p2m_ioreq_server )
-        {
-            ASSERT(p2m->ioreq.entry_count > 0);
-            p2m->ioreq.entry_count--;
-        }
-
         /* level 1 entry */
         p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
         /* NB: paging_write_p2m_entry() handles tlb flushes properly */
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 78525ddd23..b210c7447e 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3185,6 +3185,9 @@ shadow_write_p2m_entry(struct domain *d, unsigned long gfn,
     if ( likely(d->arch.paging.shadow.total_pages != 0) )
          sh_unshadow_for_p2m_change(d, gfn, p, new, level);
 
+    p2m_entry_modify(p2m_get_hostp2m(d), p2m_flags_to_type(l1e_get_flags(new)),
+                     p2m_flags_to_type(l1e_get_flags(*p)), level);
+
     /* Update the entry with new content */
     safe_write_pte(p, new);
 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 2095076556..834d49d2d4 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -932,6 +932,38 @@ int p2m_set_ioreq_server(struct domain *d, unsigned int flags,
 struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
                                               unsigned int *flags);
 
+static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
+                                    p2m_type_t ot, unsigned int level)
+{
+    if ( level != 1 || nt == ot )
+        return;
+
+    switch ( nt )
+    {
+    case p2m_ioreq_server:
+        /*
+         * p2m_ioreq_server is only used for 4K pages, so
+         * the count is only done for level 1 entries.
+         */
+        p2m->ioreq.entry_count++;
+        break;
+
+    default:
+        break;
+    }
+
+    switch ( ot )
+    {
+    case p2m_ioreq_server:
+        ASSERT(p2m->ioreq.entry_count > 0);
+        p2m->ioreq.entry_count--;
+        break;
+
+    default:
+        break;
+    }
+}
+
 #endif /* _XEN_ASM_X86_P2M_H */
 
 /*
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
                   ` (5 preceding siblings ...)
  2019-01-30 10:36 ` [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  2019-02-06 16:59   ` Jan Beulich
  2019-02-07 17:49   ` George Dunlap
  2019-01-30 10:36 ` [PATCH 8/8] npt/shadow: allow getting foreign page table entries Roger Pau Monne
  7 siblings, 2 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Jun Nakajima, Roger Pau Monne

So that the specific handling can be removed from
atomic_write_ept_entry and be shared with npt and shadow code.

This commit also removes the check that prevent non-ept PVH dom0 from
mapping foreign pages.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/mm/hap/hap.c       |   3 +-
 xen/arch/x86/mm/p2m-ept.c       | 108 +++++++-------------------------
 xen/arch/x86/mm/p2m-pt.c        |   7 ---
 xen/arch/x86/mm/shadow/common.c |   3 +-
 xen/include/asm-x86/p2m.h       |  30 ++++++++-
 5 files changed, 53 insertions(+), 98 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index dc46d5e14f..4f52639be5 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -735,7 +735,8 @@ hap_write_p2m_entry(struct domain *d, unsigned long gfn, l1_pgentry_t *p,
     }
 
     p2m_entry_modify(p2m_get_hostp2m(d), p2m_flags_to_type(l1e_get_flags(new)),
-                     p2m_flags_to_type(old_flags), level);
+                     p2m_flags_to_type(old_flags), l1e_get_mfn(new),
+                     l1e_get_mfn(*p), level);
 
     safe_write_pte(p, new);
     if ( old_flags & _PAGE_PRESENT )
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index 0ece6608cb..2b0c3ab265 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -45,65 +45,13 @@ static inline bool_t is_epte_valid(ept_entry_t *e)
     return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
 }
 
-/* returns : 0 for success, -errno otherwise */
-static int atomic_write_ept_entry(struct p2m_domain *p2m,
-                                  ept_entry_t *entryptr, ept_entry_t new,
-                                  int level)
+static void atomic_write_ept_entry(struct p2m_domain *p2m,
+                                   ept_entry_t *entryptr, ept_entry_t new,
+                                   int level)
 {
-    int rc;
-    unsigned long oldmfn = mfn_x(INVALID_MFN);
-    bool_t check_foreign = (new.mfn != entryptr->mfn ||
-                            new.sa_p2mt != entryptr->sa_p2mt);
-
-    if ( level )
-    {
-        ASSERT(!is_epte_superpage(&new) || !p2m_is_foreign(new.sa_p2mt));
-        write_atomic(&entryptr->epte, new.epte);
-        return 0;
-    }
-
-    if ( unlikely(p2m_is_foreign(new.sa_p2mt)) )
-    {
-        rc = -EINVAL;
-        if ( !is_epte_present(&new) )
-                goto out;
-
-        if ( check_foreign )
-        {
-            struct domain *fdom;
-
-            if ( !mfn_valid(_mfn(new.mfn)) )
-                goto out;
-
-            rc = -ESRCH;
-            fdom = page_get_owner(mfn_to_page(_mfn(new.mfn)));
-            if ( fdom == NULL )
-                goto out;
-
-            /* get refcount on the page */
-            rc = -EBUSY;
-            if ( !get_page(mfn_to_page(_mfn(new.mfn)), fdom) )
-                goto out;
-        }
-    }
-
-    if ( unlikely(p2m_is_foreign(entryptr->sa_p2mt)) && check_foreign )
-        oldmfn = entryptr->mfn;
-
-    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, level);
-
+    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, _mfn(new.mfn),
+                     _mfn(entryptr->mfn), level);
     write_atomic(&entryptr->epte, new.epte);
-
-    if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) )
-        put_page(mfn_to_page(_mfn(oldmfn)));
-
-    rc = 0;
-
- out:
-    if ( rc )
-        gdprintk(XENLOG_ERR, "epte o:%"PRIx64" n:%"PRIx64" rc:%d\n",
-                 entryptr->epte, new.epte, rc);
-    return rc;
 }
 
 static void ept_p2m_type_to_flags(struct p2m_domain *p2m, ept_entry_t *entry,
@@ -396,7 +344,6 @@ static int ept_next_level(struct p2m_domain *p2m, bool_t read_only,
 static bool_t ept_invalidate_emt(struct p2m_domain *p2m, mfn_t mfn,
                                  bool_t recalc, int level)
 {
-    int rc;
     ept_entry_t *epte = map_domain_page(mfn);
     unsigned int i;
     bool_t changed = 0;
@@ -412,8 +359,7 @@ static bool_t ept_invalidate_emt(struct p2m_domain *p2m, mfn_t mfn,
         e.emt = MTRR_NUM_TYPES;
         if ( recalc )
             e.recalc = 1;
-        rc = atomic_write_ept_entry(p2m, &epte[i], e, level);
-        ASSERT(rc == 0);
+        atomic_write_ept_entry(p2m, &epte[i], e, level);
         changed = 1;
     }
 
@@ -437,7 +383,7 @@ static int ept_invalidate_emt_range(struct p2m_domain *p2m,
     ept_entry_t *table;
     unsigned long gfn_remainder = first_gfn;
     unsigned int i, index;
-    int wrc, rc = 0, ret = GUEST_TABLE_MAP_FAILED;
+    int rc = 0, ret = GUEST_TABLE_MAP_FAILED;
 
     table = map_domain_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
     for ( i = p2m->ept.wl; i > target; --i )
@@ -463,8 +409,7 @@ static int ept_invalidate_emt_range(struct p2m_domain *p2m,
             rc = -ENOMEM;
             goto out;
         }
-        wrc = atomic_write_ept_entry(p2m, &table[index], split_ept_entry, i);
-        ASSERT(wrc == 0);
+        atomic_write_ept_entry(p2m, &table[index], split_ept_entry, i);
 
         for ( ; i > target; --i )
             if ( !ept_next_level(p2m, 1, &table, &gfn_remainder, i) )
@@ -483,8 +428,7 @@ static int ept_invalidate_emt_range(struct p2m_domain *p2m,
         {
             e.emt = MTRR_NUM_TYPES;
             e.recalc = 1;
-            wrc = atomic_write_ept_entry(p2m, &table[index], e, target);
-            ASSERT(wrc == 0);
+            atomic_write_ept_entry(p2m, &table[index], e, target);
             rc = 1;
         }
     }
@@ -513,7 +457,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
     unsigned int level = ept->wl;
     unsigned long mfn = ept->mfn;
     ept_entry_t *epte;
-    int wrc, rc = 0;
+    int rc = 0;
 
     if ( !mfn )
         return 0;
@@ -557,8 +501,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                         ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
                     }
                     e.recalc = 0;
-                    wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
-                    ASSERT(wrc == 0);
+                    atomic_write_ept_entry(p2m, &epte[i], e, level);
                 }
             }
             else
@@ -593,8 +536,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                 {
                     if ( ept_split_super_page(p2m, &e, level, level - 1) )
                     {
-                        wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
-                        ASSERT(wrc == 0);
+                        atomic_write_ept_entry(p2m, &epte[i], e, level);
                         unmap_domain_page(epte);
                         mfn = e.mfn;
                         continue;
@@ -608,8 +550,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
                 e.recalc = 0;
                 if ( recalc && p2m_is_changeable(e.sa_p2mt) )
                     ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
-                wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
-                ASSERT(wrc == 0);
+                atomic_write_ept_entry(p2m, &epte[i], e, level);
             }
 
             rc = 1;
@@ -623,8 +564,7 @@ static int resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
             smp_wmb();
             e.emt = 0;
             e.recalc = 0;
-            wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
-            ASSERT(wrc == 0);
+            atomic_write_ept_entry(p2m, &epte[i], e, level);
             unmap_domain_page(epte);
             rc = 1;
         }
@@ -784,8 +724,7 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
 
         /* now install the newly split ept sub-tree */
         /* NB: please make sure domian is paused and no in-fly VT-d DMA. */
-        rc = atomic_write_ept_entry(p2m, ept_entry, split_ept_entry, i);
-        ASSERT(rc == 0);
+        atomic_write_ept_entry(p2m, ept_entry, split_ept_entry, i);
 
         /* then move to the level we want to make real changes */
         for ( ; i > target; i-- )
@@ -831,18 +770,13 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         new_entry.suppress_ve = is_epte_valid(&old_entry) ?
                                     old_entry.suppress_ve : 1;
 
-    rc = atomic_write_ept_entry(p2m, ept_entry, new_entry, target);
-    if ( unlikely(rc) )
-        old_entry.epte = 0;
-    else
-    {
-        entry_written = 1;
+    atomic_write_ept_entry(p2m, ept_entry, new_entry, target);
+    entry_written = 1;
 
-        if ( p2mt != p2m_invalid &&
-             (gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) )
-            /* Track the highest gfn for which we have ever had a valid mapping */
-            p2m->max_mapped_pfn = gfn + (1UL << order) - 1;
-    }
+    if ( p2mt != p2m_invalid &&
+         (gfn + (1UL << order) - 1 > p2m->max_mapped_pfn) )
+        /* Track the highest gfn for which we have ever had a valid mapping */
+        p2m->max_mapped_pfn = gfn + (1UL << order) - 1;
 
 out:
     if ( needs_sync )
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index c8905a5597..6238043aa6 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -523,13 +523,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
         __trace_var(TRC_MEM_SET_P2M_ENTRY, 0, sizeof(t), &t);
     }
 
-    if ( unlikely(p2m_is_foreign(p2mt)) )
-    {
-        /* hvm fixme: foreign types are only supported on ept at present */
-        gdprintk(XENLOG_WARNING, "Unimplemented foreign p2m type.\n");
-        return -EINVAL;
-    }
-
     /* Carry out any eventually pending earlier changes first. */
     rc = do_recalc(p2m, gfn);
     if ( rc < 0 )
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index b210c7447e..7f0e05045e 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3186,7 +3186,8 @@ shadow_write_p2m_entry(struct domain *d, unsigned long gfn,
          sh_unshadow_for_p2m_change(d, gfn, p, new, level);
 
     p2m_entry_modify(p2m_get_hostp2m(d), p2m_flags_to_type(l1e_get_flags(new)),
-                     p2m_flags_to_type(l1e_get_flags(*p)), level);
+                     p2m_flags_to_type(l1e_get_flags(*p)), l1e_get_mfn(new),
+                     l1e_get_mfn(*p), level);
 
     /* Update the entry with new content */
     safe_write_pte(p, new);
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 834d49d2d4..1cc8acb3fe 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
                                               unsigned int *flags);
 
 static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
-                                    p2m_type_t ot, unsigned int level)
+                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
+                                    unsigned int level)
 {
-    if ( level != 1 || nt == ot )
+    struct page_info *pg;
+
+    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
         return;
 
     switch ( nt )
@@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
         p2m->ioreq.entry_count++;
         break;
 
+    case p2m_map_foreign:
+        pg = mfn_to_page(nfn);
+
+        if ( !pg || !page_get_owner_and_reference(pg) )
+        {
+            ASSERT_UNREACHABLE();
+            return;
+        }
+
+        break;
+
     default:
         break;
     }
@@ -959,6 +973,18 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
         p2m->ioreq.entry_count--;
         break;
 
+    case p2m_map_foreign:
+        pg = mfn_to_page(ofn);
+
+        if ( !pg )
+        {
+            ASSERT_UNREACHABLE();
+            return;
+        }
+        put_page(pg);
+
+        break;
+
     default:
         break;
     }
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 8/8] npt/shadow: allow getting foreign page table entries
  2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
                   ` (6 preceding siblings ...)
  2019-01-30 10:36 ` [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify Roger Pau Monne
@ 2019-01-30 10:36 ` Roger Pau Monne
  7 siblings, 0 replies; 62+ messages in thread
From: Roger Pau Monne @ 2019-01-30 10:36 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Current npt and shadow code to get an entry will always return
INVALID_MFN for foreign entries. Allow to return the entry mfn for
foreign entries, like it's done for grant table entries.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm/p2m-pt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 6238043aa6..973261a3a8 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -852,7 +852,8 @@ pod_retry_l1:
     unmap_domain_page(l1e);
 
     ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t));
-    return (p2m_is_valid(*t) || p2m_is_grant(*t)) ? mfn : INVALID_MFN;
+    return (p2m_is_valid(*t) || p2m_is_grant(*t) || p2m_is_foreign(*t))
+           ? mfn : INVALID_MFN;
 }
 
 static void p2m_pt_change_entry_type_global(struct p2m_domain *p2m,
-- 
2.20.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
@ 2019-01-30 12:37   ` Wei Liu
  2019-01-30 13:58     ` Roger Pau Monné
  2019-02-04 16:41   ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Wei Liu @ 2019-01-30 12:37 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Wed, Jan 30, 2019 at 11:36:39AM +0100, Roger Pau Monne wrote:
> Due to the recent changes in the iommu mapping logic, the start
> addresses provided need to be aligned to the order intended to be
> mapped.
> 

Can you reference some commits here? What would happen if the address is
not aligned?

> dom0 PVH domain builder didn't take this into account when populating
> the p2m, fix this by making sure the order is chosen so that the start
> address is aligned to it.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
> Without this patch trying to create a PVH dom0 will trigger an assert
> on certain hardware depending on the memory map.
> ---
>  xen/arch/x86/hvm/dom0_build.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index 51cf490811..a571d15c13 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -152,6 +152,8 @@ static int __init pvh_populate_memory_range(struct domain *d,
>  
>          order = get_order_from_pages(end - start + 1);
>          order = min(order ? order - 1 : 0, max_order);
> +        /* The order allocated and populated must be aligned to the address. */
> +        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);

Isn't max_order better here?

>          page = alloc_domheap_pages(d, order, dom0_memflags | MEMF_no_scrub);
>          if ( page == NULL )
>          {
> -- 
> 2.20.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-01-30 12:37   ` Wei Liu
@ 2019-01-30 13:58     ` Roger Pau Monné
  2019-01-31 17:22       ` Wei Liu
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-01-30 13:58 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Jan Beulich, Andrew Cooper

On Wed, Jan 30, 2019 at 12:37:28PM +0000, Wei Liu wrote:
> On Wed, Jan 30, 2019 at 11:36:39AM +0100, Roger Pau Monne wrote:
> > Due to the recent changes in the iommu mapping logic, the start
> > addresses provided need to be aligned to the order intended to be
> > mapped.
> > 
> 
> Can you reference some commits here? What would happen if the address is
> not aligned?

See:
https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01030.html

and

https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01503.html

> > dom0 PVH domain builder didn't take this into account when populating
> > the p2m, fix this by making sure the order is chosen so that the start
> > address is aligned to it.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > ---
> > Without this patch trying to create a PVH dom0 will trigger an assert
> > on certain hardware depending on the memory map.
> > ---
> >  xen/arch/x86/hvm/dom0_build.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> > index 51cf490811..a571d15c13 100644
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -152,6 +152,8 @@ static int __init pvh_populate_memory_range(struct domain *d,
> >  
> >          order = get_order_from_pages(end - start + 1);
> >          order = min(order ? order - 1 : 0, max_order);
> > +        /* The order allocated and populated must be aligned to the address. */
> > +        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
> 
> Isn't max_order better here?

It will yield the same result because order has already been limited
by max_order. I've used MAX_ORDER directly because it's a constant and
could be faster than loading the value in max_order. You could also
use 'order' instead of MAX_ORDER and will also yield the same result.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper
  2019-01-30 10:36 ` [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper Roger Pau Monne
@ 2019-01-31 14:59   ` Paul Durrant
  2019-01-31 16:58     ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Durrant @ 2019-01-31 14:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, George Dunlap,
	Jun Nakajima, Roger Pau Monne

> -----Original Message-----
> From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> Sent: 30 January 2019 10:37
> To: xen-devel@lists.xenproject.org
> Cc: Roger Pau Monne <roger.pau@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Jun Nakajima
> <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Subject: [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling
> into helper
> 
> So that it can be shared by both ept, npt and shadow code, instead of
> duplicating it.
> 
> No change in functionality intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Paul Durrant <paul.durrant@citrix.com>
> ---
>  xen/arch/x86/mm/hap/hap.c       |  3 ++
>  xen/arch/x86/mm/p2m-ept.c       | 55 +++++++++++----------------------
>  xen/arch/x86/mm/p2m-pt.c        | 20 ------------
>  xen/arch/x86/mm/shadow/common.c |  3 ++
>  xen/include/asm-x86/p2m.h       | 32 +++++++++++++++++++
>  5 files changed, 56 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index 3d651b94c3..dc46d5e14f 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -734,6 +734,9 @@ hap_write_p2m_entry(struct domain *d, unsigned long
> gfn, l1_pgentry_t *p,
>              && perms_strictly_increased(old_flags, l1e_get_flags(new)) );
>      }
> 
> +    p2m_entry_modify(p2m_get_hostp2m(d),
> p2m_flags_to_type(l1e_get_flags(new)),
> +                     p2m_flags_to_type(old_flags), level);
> +
>      safe_write_pte(p, new);
>      if ( old_flags & _PAGE_PRESENT )
>          flush_tlb_mask(d->dirty_cpumask);
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index bb562607f7..0ece6608cb 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -46,7 +46,8 @@ static inline bool_t is_epte_valid(ept_entry_t *e)
>  }
> 
>  /* returns : 0 for success, -errno otherwise */
> -static int atomic_write_ept_entry(ept_entry_t *entryptr, ept_entry_t new,
> +static int atomic_write_ept_entry(struct p2m_domain *p2m,
> +                                  ept_entry_t *entryptr, ept_entry_t new,
>                                    int level)
>  {
>      int rc;
> @@ -89,6 +90,8 @@ static int atomic_write_ept_entry(ept_entry_t *entryptr,
> ept_entry_t new,
>      if ( unlikely(p2m_is_foreign(entryptr->sa_p2mt)) && check_foreign )
>          oldmfn = entryptr->mfn;
> 
> +    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, level);
> +
>      write_atomic(&entryptr->epte, new.epte);
> 
>      if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) )
> @@ -390,7 +393,8 @@ static int ept_next_level(struct p2m_domain *p2m,
> bool_t read_only,
>   * present entries in the given page table, optionally marking the
> entries
>   * also for their subtrees needing P2M type re-calculation.
>   */
> -static bool_t ept_invalidate_emt(mfn_t mfn, bool_t recalc, int level)
> +static bool_t ept_invalidate_emt(struct p2m_domain *p2m, mfn_t mfn,
> +                                 bool_t recalc, int level)
>  {
>      int rc;
>      ept_entry_t *epte = map_domain_page(mfn);
> @@ -408,7 +412,7 @@ static bool_t ept_invalidate_emt(mfn_t mfn, bool_t
> recalc, int level)
>          e.emt = MTRR_NUM_TYPES;
>          if ( recalc )
>              e.recalc = 1;
> -        rc = atomic_write_ept_entry(&epte[i], e, level);
> +        rc = atomic_write_ept_entry(p2m, &epte[i], e, level);
>          ASSERT(rc == 0);
>          changed = 1;
>      }
> @@ -459,7 +463,7 @@ static int ept_invalidate_emt_range(struct p2m_domain
> *p2m,
>              rc = -ENOMEM;
>              goto out;
>          }
> -        wrc = atomic_write_ept_entry(&table[index], split_ept_entry, i);
> +        wrc = atomic_write_ept_entry(p2m, &table[index], split_ept_entry,
> i);
>          ASSERT(wrc == 0);
> 
>          for ( ; i > target; --i )
> @@ -479,7 +483,7 @@ static int ept_invalidate_emt_range(struct p2m_domain
> *p2m,
>          {
>              e.emt = MTRR_NUM_TYPES;
>              e.recalc = 1;
> -            wrc = atomic_write_ept_entry(&table[index], e, target);
> +            wrc = atomic_write_ept_entry(p2m, &table[index], e, target);
>              ASSERT(wrc == 0);
>              rc = 1;
>          }
> @@ -549,17 +553,11 @@ static int resolve_misconfig(struct p2m_domain *p2m,
> unsigned long gfn)
>                      nt = p2m_recalc_type(e.recalc, e.sa_p2mt, p2m, gfn +
> i);
>                      if ( nt != e.sa_p2mt )
>                      {
> -                        if ( e.sa_p2mt == p2m_ioreq_server )
> -                        {
> -                            ASSERT(p2m->ioreq.entry_count > 0);
> -                            p2m->ioreq.entry_count--;
> -                        }
> -
>                          e.sa_p2mt = nt;
>                          ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt,
> e.access);
>                      }
>                      e.recalc = 0;
> -                    wrc = atomic_write_ept_entry(&epte[i], e, level);
> +                    wrc = atomic_write_ept_entry(p2m, &epte[i], e,
> level);
>                      ASSERT(wrc == 0);
>                  }
>              }
> @@ -595,7 +593,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
> unsigned long gfn)
>                  {
>                      if ( ept_split_super_page(p2m, &e, level, level - 1)
> )
>                      {
> -                        wrc = atomic_write_ept_entry(&epte[i], e, level);
> +                        wrc = atomic_write_ept_entry(p2m, &epte[i], e,
> level);
>                          ASSERT(wrc == 0);
>                          unmap_domain_page(epte);
>                          mfn = e.mfn;
> @@ -610,7 +608,7 @@ static int resolve_misconfig(struct p2m_domain *p2m,
> unsigned long gfn)
>                  e.recalc = 0;
>                  if ( recalc && p2m_is_changeable(e.sa_p2mt) )
>                      ept_p2m_type_to_flags(p2m, &e, e.sa_p2mt, e.access);
> -                wrc = atomic_write_ept_entry(&epte[i], e, level);
> +                wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
>                  ASSERT(wrc == 0);
>              }
> 
> @@ -621,11 +619,11 @@ static int resolve_misconfig(struct p2m_domain *p2m,
> unsigned long gfn)
>          if ( e.emt == MTRR_NUM_TYPES )
>          {
>              ASSERT(is_epte_present(&e));
> -            ept_invalidate_emt(_mfn(e.mfn), e.recalc, level);
> +            ept_invalidate_emt(p2m, _mfn(e.mfn), e.recalc, level);
>              smp_wmb();
>              e.emt = 0;
>              e.recalc = 0;
> -            wrc = atomic_write_ept_entry(&epte[i], e, level);
> +            wrc = atomic_write_ept_entry(p2m, &epte[i], e, level);
>              ASSERT(wrc == 0);
>              unmap_domain_page(epte);
>              rc = 1;
> @@ -786,7 +784,7 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_,
> mfn_t mfn,
> 
>          /* now install the newly split ept sub-tree */
>          /* NB: please make sure domian is paused and no in-fly VT-d DMA.
> */
> -        rc = atomic_write_ept_entry(ept_entry, split_ept_entry, i);
> +        rc = atomic_write_ept_entry(p2m, ept_entry, split_ept_entry, i);
>          ASSERT(rc == 0);
> 
>          /* then move to the level we want to make real changes */
> @@ -833,24 +831,7 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_,
> mfn_t mfn,
>          new_entry.suppress_ve = is_epte_valid(&old_entry) ?
>                                      old_entry.suppress_ve : 1;
> 
> -    /*
> -     * p2m_ioreq_server is only used for 4K pages, so the
> -     * count is only done on ept page table entries.
> -     */
> -    if ( p2mt == p2m_ioreq_server )
> -    {
> -        ASSERT(i == 0);
> -        p2m->ioreq.entry_count++;
> -    }
> -
> -    if ( ept_entry->sa_p2mt == p2m_ioreq_server )
> -    {
> -        ASSERT(i == 0);
> -        ASSERT(p2m->ioreq.entry_count > 0);
> -        p2m->ioreq.entry_count--;
> -    }
> -
> -    rc = atomic_write_ept_entry(ept_entry, new_entry, target);
> +    rc = atomic_write_ept_entry(p2m, ept_entry, new_entry, target);
>      if ( unlikely(rc) )
>          old_entry.epte = 0;
>      else
> @@ -1070,7 +1051,7 @@ static void ept_change_entry_type_global(struct
> p2m_domain *p2m,
>      if ( !mfn )
>          return;
> 
> -    if ( ept_invalidate_emt(_mfn(mfn), 1, p2m->ept.wl) )
> +    if ( ept_invalidate_emt(p2m, _mfn(mfn), 1, p2m->ept.wl) )
>          ept_sync_domain(p2m);
>  }
> 
> @@ -1128,7 +1109,7 @@ static void ept_memory_type_changed(struct
> p2m_domain *p2m)
>      if ( !mfn )
>          return;
> 
> -    if ( ept_invalidate_emt(_mfn(mfn), 0, p2m->ept.wl) )
> +    if ( ept_invalidate_emt(p2m, _mfn(mfn), 0, p2m->ept.wl) )
>          ept_sync_domain(p2m);
>  }
> 
> diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
> index b8996e5415..c8905a5597 100644
> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -436,13 +436,6 @@ static int do_recalc(struct p2m_domain *p2m, unsigned
> long gfn)
>                  flags |= _PAGE_PSE;
>              }
> 
> -            if ( ot == p2m_ioreq_server )
> -            {
> -                ASSERT(p2m->ioreq.entry_count > 0);
> -                ASSERT(level == 0);
> -                p2m->ioreq.entry_count--;
> -            }
> -
>              e = l1e_from_pfn(mfn, flags);
>              p2m_add_iommu_flags(&e, level,
>                                  (nt == p2m_ram_rw)
> @@ -627,19 +620,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_,
> mfn_t mfn,
> 
>          p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> 
> -        /*
> -         * p2m_ioreq_server is only used for 4K pages, so
> -         * the count is only done for level 1 entries.
> -         */
> -        if ( p2mt == p2m_ioreq_server )
> -            p2m->ioreq.entry_count++;
> -
> -        if ( p2mt_old == p2m_ioreq_server )
> -        {
> -            ASSERT(p2m->ioreq.entry_count > 0);
> -            p2m->ioreq.entry_count--;
> -        }
> -

Is p2m_old still used once this has gone? In my slightly out-of-date branch it looks like it could be removed.

  Paul

>          /* level 1 entry */
>          p2m->write_p2m_entry(p2m, gfn, p2m_entry, entry_content, 1);
>          /* NB: paging_write_p2m_entry() handles tlb flushes properly */
> diff --git a/xen/arch/x86/mm/shadow/common.c
> b/xen/arch/x86/mm/shadow/common.c
> index 78525ddd23..b210c7447e 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -3185,6 +3185,9 @@ shadow_write_p2m_entry(struct domain *d, unsigned
> long gfn,
>      if ( likely(d->arch.paging.shadow.total_pages != 0) )
>           sh_unshadow_for_p2m_change(d, gfn, p, new, level);
> 
> +    p2m_entry_modify(p2m_get_hostp2m(d),
> p2m_flags_to_type(l1e_get_flags(new)),
> +                     p2m_flags_to_type(l1e_get_flags(*p)), level);
> +
>      /* Update the entry with new content */
>      safe_write_pte(p, new);
> 
> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 2095076556..834d49d2d4 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -932,6 +932,38 @@ int p2m_set_ioreq_server(struct domain *d, unsigned
> int flags,
>  struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>                                                unsigned int *flags);
> 
> +static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t
> nt,
> +                                    p2m_type_t ot, unsigned int level)
> +{
> +    if ( level != 1 || nt == ot )
> +        return;
> +
> +    switch ( nt )
> +    {
> +    case p2m_ioreq_server:
> +        /*
> +         * p2m_ioreq_server is only used for 4K pages, so
> +         * the count is only done for level 1 entries.
> +         */
> +        p2m->ioreq.entry_count++;
> +        break;
> +
> +    default:
> +        break;
> +    }
> +
> +    switch ( ot )
> +    {
> +    case p2m_ioreq_server:
> +        ASSERT(p2m->ioreq.entry_count > 0);
> +        p2m->ioreq.entry_count--;
> +        break;
> +
> +    default:
> +        break;
> +    }
> +}
> +
>  #endif /* _XEN_ASM_X86_P2M_H */
> 
>  /*
> --
> 2.20.1

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper
  2019-01-31 14:59   ` Paul Durrant
@ 2019-01-31 16:58     ` Roger Pau Monné
  0 siblings, 0 replies; 62+ messages in thread
From: Roger Pau Monné @ 2019-01-31 16:58 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Kevin Tian, Wei Liu, Jan Beulich, Andrew Cooper, George Dunlap,
	Jun Nakajima, xen-devel

On Thu, Jan 31, 2019 at 03:59:50PM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Roger Pau Monne [mailto:roger.pau@citrix.com]
> > Sent: 30 January 2019 10:37
> > To: xen-devel@lists.xenproject.org
> > Cc: Roger Pau Monne <roger.pau@citrix.com>; George Dunlap
> > <George.Dunlap@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper
> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Jun Nakajima
> > <jun.nakajima@intel.com>; Kevin Tian <kevin.tian@intel.com>; Paul Durrant
> > <Paul.Durrant@citrix.com>
> > Subject: [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling
> > into helper
> > @@ -627,19 +620,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_,
> > mfn_t mfn,
> > 
> >          p2mt_old = p2m_flags_to_type(l1e_get_flags(*p2m_entry));
> > 
> > -        /*
> > -         * p2m_ioreq_server is only used for 4K pages, so
> > -         * the count is only done for level 1 entries.
> > -         */
> > -        if ( p2mt == p2m_ioreq_server )
> > -            p2m->ioreq.entry_count++;
> > -
> > -        if ( p2mt_old == p2m_ioreq_server )
> > -        {
> > -            ASSERT(p2m->ioreq.entry_count > 0);
> > -            p2m->ioreq.entry_count--;
> > -        }
> > -
> 
> Is p2m_old still used once this has gone? In my slightly out-of-date branch it looks like it could be removed.

Yes it can be removed, thanks for noticing! Will fix in v2.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-01-30 13:58     ` Roger Pau Monné
@ 2019-01-31 17:22       ` Wei Liu
  0 siblings, 0 replies; 62+ messages in thread
From: Wei Liu @ 2019-01-31 17:22 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Wed, Jan 30, 2019 at 02:58:42PM +0100, Roger Pau Monné wrote:
> On Wed, Jan 30, 2019 at 12:37:28PM +0000, Wei Liu wrote:
> > On Wed, Jan 30, 2019 at 11:36:39AM +0100, Roger Pau Monne wrote:
> > > Due to the recent changes in the iommu mapping logic, the start
> > > addresses provided need to be aligned to the order intended to be
> > > mapped.
> > > 
> > 
> > Can you reference some commits here? What would happen if the address is
> > not aligned?
> 
> See:
> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01030.html
> 
> and
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-01/msg01503.html
> 
> > > dom0 PVH domain builder didn't take this into account when populating
> > > the p2m, fix this by making sure the order is chosen so that the start
> > > address is aligned to it.
> > > 
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > Cc: Jan Beulich <jbeulich@suse.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > > Without this patch trying to create a PVH dom0 will trigger an assert
> > > on certain hardware depending on the memory map.
> > > ---
> > >  xen/arch/x86/hvm/dom0_build.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> > > index 51cf490811..a571d15c13 100644
> > > --- a/xen/arch/x86/hvm/dom0_build.c
> > > +++ b/xen/arch/x86/hvm/dom0_build.c
> > > @@ -152,6 +152,8 @@ static int __init pvh_populate_memory_range(struct domain *d,
> > >  
> > >          order = get_order_from_pages(end - start + 1);
> > >          order = min(order ? order - 1 : 0, max_order);
> > > +        /* The order allocated and populated must be aligned to the address. */
> > > +        order = min(order, start ? find_first_set_bit(start) : MAX_ORDER);
> > 
> > Isn't max_order better here?
> 
> It will yield the same result because order has already been limited
> by max_order. I've used MAX_ORDER directly because it's a constant and
> could be faster than loading the value in max_order. You could also
> use 'order' instead of MAX_ORDER and will also yield the same result.
> 

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
  2019-01-30 12:37   ` Wei Liu
@ 2019-02-04 16:41   ` Jan Beulich
  2019-02-04 17:11     ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-04 16:41 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> Due to the recent changes in the iommu mapping logic, the start
> addresses provided need to be aligned to the order intended to be
> mapped.

Irrespective of your reply to Wei's similar request (where you've
provided links to mails showing crashes) I'd like you to explain
this better. This is in particular because I don't really see what
"recent changes in the iommu mapping logic" you talk about. The
code change itself looks okay, despite me not being overly happy
with the use of min() here, which iirc was already commented
about.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-01-30 10:36 ` [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries Roger Pau Monne
@ 2019-02-04 16:48   ` Andrew Cooper
  2019-02-04 16:56   ` Jan Beulich
  1 sibling, 0 replies; 62+ messages in thread
From: Andrew Cooper @ 2019-02-04 16:48 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: George Dunlap, Juergen Gross, Wei Liu, George Dunlap, Jan Beulich

On 30/01/2019 10:36, Roger Pau Monne wrote:

Subject s/ntp/npt/

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-01-30 10:36 ` [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries Roger Pau Monne
  2019-02-04 16:48   ` Andrew Cooper
@ 2019-02-04 16:56   ` Jan Beulich
  2019-02-04 17:18     ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-04 16:56 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> The assert was originally added to make sure that higher order
> regions (> PAGE_ORDER_4K) could not be used to bypass the
> mmio_ro_ranges check performed by p2m_type_to_flags.
> 
> This however is already checked in set_mmio_p2m_entry, which makes
> sure that higher order mappings don't overlap with mmio_ro_ranges,
> thus allowing the creation of high order MMIO mappings safely.

Well, the assertions were added to make sure no other code
path appears that violates this requirement. Arguably e.g.
set_identity_p2m_entry() could gain an order parameter and
then try to establish larger p2m_mmio_direct entries.

Don't get me wrong, I don't object to the removal of the
assertions, but the description makes it sound as if they were
entirely redundant. Even better would be though if they
could be extended to keep triggering in "bad" cases.

> Remove the assert to allow 2M entries to be created for MMIO regions
> that don't overlap with mmio_ro_ranges.
> 
> Suggested-by: George Dunlap <george.dunlap@citrix.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Without this patch trying to create a PVH dom0 will trigger an assert
> on certain hardware depending on the memory map.

Giving a simple example in the description would be helpful.

> --- a/xen/arch/x86/mm/p2m-pt.c
> +++ b/xen/arch/x86/mm/p2m-pt.c
> @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
>          }
>  
>          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
> -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>          l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>              ? p2m_l2e_from_pfn(mfn_x(mfn),
>                                 p2m_type_to_flags(p2m, p2mt, mfn, 1))

There's a similar check in the 1G mapping logic several lines up. Why
does that not also need removing?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-04 16:41   ` Jan Beulich
@ 2019-02-04 17:11     ` Roger Pau Monné
  2019-02-05  7:42       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-04 17:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, Feb 04, 2019 at 09:41:34AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > Due to the recent changes in the iommu mapping logic, the start
> > addresses provided need to be aligned to the order intended to be
> > mapped.
> 
> Irrespective of your reply to Wei's similar request (where you've
> provided links to mails showing crashes) I'd like you to explain
> this better. This is in particular because I don't really see what
> "recent changes in the iommu mapping logic" you talk about.

Commit 725bf00a87f ("iommu / p2m: add a page_order parameter to
iommu_map/unmap_page()...") added the following two asserts to
iommu_map:

ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));

Previously iommu_map would add unaligned entries without complaining,
but now in debug builds the assert will trigger.

Maybe adding a 'Fixes' tag would help identifying what this commit is
trying to address?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-04 16:56   ` Jan Beulich
@ 2019-02-04 17:18     ` Roger Pau Monné
  2019-02-05  7:45       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-04 17:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > The assert was originally added to make sure that higher order
> > regions (> PAGE_ORDER_4K) could not be used to bypass the
> > mmio_ro_ranges check performed by p2m_type_to_flags.
> > 
> > This however is already checked in set_mmio_p2m_entry, which makes
> > sure that higher order mappings don't overlap with mmio_ro_ranges,
> > thus allowing the creation of high order MMIO mappings safely.
> 
> Well, the assertions were added to make sure no other code
> path appears that violates this requirement. Arguably e.g.
> set_identity_p2m_entry() could gain an order parameter and
> then try to establish larger p2m_mmio_direct entries.
> 
> Don't get me wrong, I don't object to the removal of the
> assertions, but the description makes it sound as if they were
> entirely redundant. Even better would be though if they
> could be extended to keep triggering in "bad" cases.

I could add something like:

ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
                                mfn_x(mfn) + PFN_DOWN(MB(2))));

I think this should be safe and would trigger in case of misuse.

> 
> > Remove the assert to allow 2M entries to be created for MMIO regions
> > that don't overlap with mmio_ro_ranges.
> > 
> > Suggested-by: George Dunlap <george.dunlap@citrix.com>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: George Dunlap <george.dunlap@eu.citrix.com>
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > Cc: Juergen Gross <jgross@suse.com>
> > ---
> > Without this patch trying to create a PVH dom0 will trigger an assert
> > on certain hardware depending on the memory map.
> 
> Giving a simple example in the description would be helpful.
> 
> > --- a/xen/arch/x86/mm/p2m-pt.c
> > +++ b/xen/arch/x86/mm/p2m-pt.c
> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
> >          }
> >  
> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >          l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
> >              ? p2m_l2e_from_pfn(mfn_x(mfn),
> >                                 p2m_type_to_flags(p2m, p2mt, mfn, 1))
> 
> There's a similar check in the 1G mapping logic several lines up. Why
> does that not also need removing?

So far mmio_order doesn't allow creation of 1G entries for mmio
regions, that's why I haven't removed that assert. I can however
replace it with the assert suggested above properly adjusted for 1G
pages.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-04 17:11     ` Roger Pau Monné
@ 2019-02-05  7:42       ` Jan Beulich
  2019-02-05 10:26         ` Roger Pau Monné
  2019-02-05 10:54         ` Andrew Cooper
  0 siblings, 2 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-05  7:42 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 04.02.19 at 18:11, <roger.pau@citrix.com> wrote:
> On Mon, Feb 04, 2019 at 09:41:34AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > Due to the recent changes in the iommu mapping logic, the start
>> > addresses provided need to be aligned to the order intended to be
>> > mapped.
>> 
>> Irrespective of your reply to Wei's similar request (where you've
>> provided links to mails showing crashes) I'd like you to explain
>> this better. This is in particular because I don't really see what
>> "recent changes in the iommu mapping logic" you talk about.
> 
> Commit 725bf00a87f ("iommu / p2m: add a page_order parameter to
> iommu_map/unmap_page()...") added the following two asserts to
> iommu_map:
> 
> ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
> ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));
> 
> Previously iommu_map would add unaligned entries without complaining,
> but now in debug builds the assert will trigger.

Right, but the assertions were added to ensure expected behavior,
not to change anything. It was a bug to call the functions without
suitably aligned frame numbers.

> Maybe adding a 'Fixes' tag would help identifying what this commit is
> trying to address?

Yes, but as per above it wouldn't be the commit adding the assertions
that the one here fixes (and hence I think the wording of the commit
message ought to change as well).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-04 17:18     ` Roger Pau Monné
@ 2019-02-05  7:45       ` Jan Beulich
  2019-02-05 10:40         ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05  7:45 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

>>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
> On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > The assert was originally added to make sure that higher order
>> > regions (> PAGE_ORDER_4K) could not be used to bypass the
>> > mmio_ro_ranges check performed by p2m_type_to_flags.
>> > 
>> > This however is already checked in set_mmio_p2m_entry, which makes
>> > sure that higher order mappings don't overlap with mmio_ro_ranges,
>> > thus allowing the creation of high order MMIO mappings safely.
>> 
>> Well, the assertions were added to make sure no other code
>> path appears that violates this requirement. Arguably e.g.
>> set_identity_p2m_entry() could gain an order parameter and
>> then try to establish larger p2m_mmio_direct entries.
>> 
>> Don't get me wrong, I don't object to the removal of the
>> assertions, but the description makes it sound as if they were
>> entirely redundant. Even better would be though if they
>> could be extended to keep triggering in "bad" cases.
> 
> I could add something like:
> 
> ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> 
> I think this should be safe and would trigger in case of misuse.

Looks okay, if slightly extended (or made conditional) to exclude
the addition of MB(2) to MFN_INVALID to wrap and potentially
hit a r/o range in the low 1Mb.

>> > --- a/xen/arch/x86/mm/p2m-pt.c
>> > +++ b/xen/arch/x86/mm/p2m-pt.c
>> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
>> >          }
>> >  
>> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
>> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>> >          l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>> >              ? p2m_l2e_from_pfn(mfn_x(mfn),
>> >                                 p2m_type_to_flags(p2m, p2mt, mfn, 1))
>> 
>> There's a similar check in the 1G mapping logic several lines up. Why
>> does that not also need removing?
> 
> So far mmio_order doesn't allow creation of 1G entries for mmio
> regions, that's why I haven't removed that assert. I can however
> replace it with the assert suggested above properly adjusted for 1G
> pages.

Yes, this or explicitly say in the description why you leave alone the
other one.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-05  7:42       ` Jan Beulich
@ 2019-02-05 10:26         ` Roger Pau Monné
  2019-02-05 11:38           ` Jan Beulich
  2019-02-05 10:54         ` Andrew Cooper
  1 sibling, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 10:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Tue, Feb 05, 2019 at 12:42:02AM -0700, Jan Beulich wrote:
> >>> On 04.02.19 at 18:11, <roger.pau@citrix.com> wrote:
> > On Mon, Feb 04, 2019 at 09:41:34AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> > Due to the recent changes in the iommu mapping logic, the start
> >> > addresses provided need to be aligned to the order intended to be
> >> > mapped.
> >> 
> >> Irrespective of your reply to Wei's similar request (where you've
> >> provided links to mails showing crashes) I'd like you to explain
> >> this better. This is in particular because I don't really see what
> >> "recent changes in the iommu mapping logic" you talk about.
> > 
> > Commit 725bf00a87f ("iommu / p2m: add a page_order parameter to
> > iommu_map/unmap_page()...") added the following two asserts to
> > iommu_map:
> > 
> > ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
> > ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));
> > 
> > Previously iommu_map would add unaligned entries without complaining,
> > but now in debug builds the assert will trigger.
> 
> Right, but the assertions were added to ensure expected behavior,
> not to change anything. It was a bug to call the functions without
> suitably aligned frame numbers.

Hm, OK. As a note p2m related map/unmap functions will still work
correctly when called with non-aligned addresses and orders, and will
then call the iommu helpers with those addresses and orders. In fact
the code that triggers the asserts in the bug reports provided are
from calls into the iommu helpers made by the p2m mapping functions.

Although I understand your point about the alignment none of the p2m
or iommu related functions ever had this requirement in the code, so
users of those interface were not aware of such requirement until the
assert was added.

> > Maybe adding a 'Fixes' tag would help identifying what this commit is
> > trying to address?
> 
> Yes, but as per above it wouldn't be the commit adding the assertions
> that the one here fixes (and hence I think the wording of the commit
> message ought to change as well).

Would you be fine with the following message:

"
The p2m and iommu mapping code always had the requirement that
addresses and orders must be aligned when populating the p2m or the
iommu page tables.

PVH dom0 builder didn't take this requirement into account, and can
call into the p2m/iommu mapping helpers with addresses and orders that
are not aligned.

Fix this by making sure the orders passed to the physmap population
helpers are always aligned to the guest address to be populated.
"

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-05  7:45       ` Jan Beulich
@ 2019-02-05 10:40         ` Roger Pau Monné
  2019-02-05 12:44           ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 10:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
> >>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
> > On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> > The assert was originally added to make sure that higher order
> >> > regions (> PAGE_ORDER_4K) could not be used to bypass the
> >> > mmio_ro_ranges check performed by p2m_type_to_flags.
> >> > 
> >> > This however is already checked in set_mmio_p2m_entry, which makes
> >> > sure that higher order mappings don't overlap with mmio_ro_ranges,
> >> > thus allowing the creation of high order MMIO mappings safely.
> >> 
> >> Well, the assertions were added to make sure no other code
> >> path appears that violates this requirement. Arguably e.g.
> >> set_identity_p2m_entry() could gain an order parameter and
> >> then try to establish larger p2m_mmio_direct entries.
> >> 
> >> Don't get me wrong, I don't object to the removal of the
> >> assertions, but the description makes it sound as if they were
> >> entirely redundant. Even better would be though if they
> >> could be extended to keep triggering in "bad" cases.
> > 
> > I could add something like:
> > 
> > ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> > 
> > I think this should be safe and would trigger in case of misuse.
> 
> Looks okay, if slightly extended (or made conditional) to exclude
> the addition of MB(2) to MFN_INVALID to wrap and potentially
> hit a r/o range in the low 1Mb.

Ack, so it would be:

ASSERT(mfn_eq(mfn, INVALID_MFN) ||
       !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
                                mfn_x(mfn) + PFN_DOWN(MB(2))));


> >> > --- a/xen/arch/x86/mm/p2m-pt.c
> >> > +++ b/xen/arch/x86/mm/p2m-pt.c
> >> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
> >> >          }
> >> >  
> >> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
> >> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >> >          l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
> >> >              ? p2m_l2e_from_pfn(mfn_x(mfn),
> >> >                                 p2m_type_to_flags(p2m, p2mt, mfn, 1))
> >> 
> >> There's a similar check in the 1G mapping logic several lines up. Why
> >> does that not also need removing?
> > 
> > So far mmio_order doesn't allow creation of 1G entries for mmio
> > regions, that's why I haven't removed that assert. I can however
> > replace it with the assert suggested above properly adjusted for 1G
> > pages.
> 
> Yes, this or explicitly say in the description why you leave alone the
> other one.

I think changing the assert is likely more future proof.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-01-30 10:36 ` [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables Roger Pau Monne
@ 2019-02-05 10:47   ` Jan Beulich
  2019-02-05 11:15     ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 10:47 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Juergen Gross, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
> page tables, but due to this being done before setting up the IOMMU
> the non RAM regions in those areas are not added to the IOMMU page
> tables. Fix this by making sure any reserved regions below 1MB are
> added to the IOMMU page tables.

So what was the reason again that we call iommu_hwdom_init() after
pvh_setup_p2m()? Am I mis-remembering there having been a patch
to flip their order?

> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>       * inclusive mapping additionally maps in every pfn up to 4GB except those
>       * that fall in unusable ranges for PV Dom0.
>       */
> -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
> -         /*
> -          * Ignore any address below 1MB, that's already identity mapped by the
> -          * Dom0 builder for HVM.
> -          */
> -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )

There was a domain ID check here, and the comment explicitly said
Dom0.

> +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>          return false;
>  
>      switch ( type = page_get_ram_type(mfn) )
> @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
>              continue;
>  
> -        if ( paging_mode_translate(d) )
> +        /*
> +         * Don't add any address below 1MB to the HAP page tables, that's
> +         * already done by the domain builder. Add addresses below 1MB to the
> +         * IOMMU page tables only.
> +         */
> +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )

Nothing like this here. Did you determine that in the late hwdom
case things work without that extra precaution (i.e. the removed
check was really pointless)? If so, mentioning this would be helpful
(at the very least to be sure this was intentional).

Jan

>              rc = set_identity_p2m_entry(d, pfn, p2m_access_rw, 0);
>          else
>              rc = iommu_map(d, _dfn(pfn), _mfn(pfn), PAGE_ORDER_4K,
> -- 
> 2.20.1





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-05  7:42       ` Jan Beulich
  2019-02-05 10:26         ` Roger Pau Monné
@ 2019-02-05 10:54         ` Andrew Cooper
  2019-02-05 11:37           ` Jan Beulich
  1 sibling, 1 reply; 62+ messages in thread
From: Andrew Cooper @ 2019-02-05 10:54 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne; +Cc: xen-devel, Wei Liu

On 05/02/2019 07:42, Jan Beulich wrote:
>>>> On 04.02.19 at 18:11, <roger.pau@citrix.com> wrote:
>> On Mon, Feb 04, 2019 at 09:41:34AM -0700, Jan Beulich wrote:
>>>>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>>>> Due to the recent changes in the iommu mapping logic, the start
>>>> addresses provided need to be aligned to the order intended to be
>>>> mapped.
>>> Irrespective of your reply to Wei's similar request (where you've
>>> provided links to mails showing crashes) I'd like you to explain
>>> this better. This is in particular because I don't really see what
>>> "recent changes in the iommu mapping logic" you talk about.
>> Commit 725bf00a87f ("iommu / p2m: add a page_order parameter to
>> iommu_map/unmap_page()...") added the following two asserts to
>> iommu_map:
>>
>> ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
>> ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));
>>
>> Previously iommu_map would add unaligned entries without complaining,
>> but now in debug builds the assert will trigger.
> Right, but the assertions were added to ensure expected behavior,
> not to change anything.

No - this isn't reasonable.

Those assertions were added "because noone should be violating them".

As it turns out, that expectation was false.  There are real codepaths
which do trip this assert, which functioned correctly before.

The two options are to either to bugfix the assertion failures by
removing the assertions, or do some code improvement to update callers
to be consistent with the new, different, expectation.

Roger was correct to being with.  The IOMMU code has recently shifted
expectations, in a way which currently malfunctions only in debug builds.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-02-05 10:47   ` Jan Beulich
@ 2019-02-05 11:15     ` Roger Pau Monné
  2019-02-05 12:49       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 11:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, xen-devel

On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
> > page tables, but due to this being done before setting up the IOMMU
> > the non RAM regions in those areas are not added to the IOMMU page
> > tables. Fix this by making sure any reserved regions below 1MB are
> > added to the IOMMU page tables.
> 
> So what was the reason again that we call iommu_hwdom_init() after
> pvh_setup_p2m()? Am I mis-remembering there having been a patch
> to flip their order?

Yes - IIRC I found (broken) hardware that requires the iommu page
tables to also contain certain RAM regions or else you get page faults
or a complete system freeze when the iommu is enabled.

It could be argued that even with this workaround such hardware is
still likely to be broken anyway, because RAM regions are not identity
mapped. From the emails I can find I was only able to observe this
behavior with pre-Haswell hardware.

Maybe it would be easier to just enable the iommu before populating
the RAM regions in the p2m? That would simplify the code here.

> > --- a/xen/drivers/passthrough/x86/iommu.c
> > +++ b/xen/drivers/passthrough/x86/iommu.c
> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> >       * inclusive mapping additionally maps in every pfn up to 4GB except those
> >       * that fall in unusable ranges for PV Dom0.
> >       */
> > -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
> > -         /*
> > -          * Ignore any address below 1MB, that's already identity mapped by the
> > -          * Dom0 builder for HVM.
> > -          */
> > -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
> 
> There was a domain ID check here, and the comment explicitly said
> Dom0.
> 
> > +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
> >          return false;
> >  
> >      switch ( type = page_get_ram_type(mfn) )
> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
> >              continue;
> >  
> > -        if ( paging_mode_translate(d) )
> > +        /*
> > +         * Don't add any address below 1MB to the HAP page tables, that's
> > +         * already done by the domain builder. Add addresses below 1MB to the
> > +         * IOMMU page tables only.
> > +         */
> > +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
> 
> Nothing like this here. Did you determine that in the late hwdom
> case things work without that extra precaution (i.e. the removed
> check was really pointless)? If so, mentioning this would be helpful
> (at the very least to be sure this was intentional).

We don't currently have support for a pvh late-hwdom AFAIK, and
whether this check is necessary or not depends on how such pvh
late-hwdom is built, explicitly how the low 1MB is handled.

Maybe it's better to just forget about the pre-haswell workarounds and
enable the iommu before populating the p2m, that would certainly
simply the code here by removing the low 1MB special casing.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-01-30 10:36 ` [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail Roger Pau Monne
@ 2019-02-05 11:21   ` Jan Beulich
  2019-02-05 11:47     ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 11:21 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> Current code in shadow_enable will allocate a shadow pool of 4MB
> regardless of the values of sh_min_allocation or
> shadow_min_acceptable_pages, which means that calls to
> shadow_alloc_p2m_page can fail even after the check and allocation
> done just above.
> 
> Fix this by always checking that the pool is big enough so the rest of
> the shadow_init function cannot fail due to lack of pages in the
> shadow pool. This is relevant to shadow_alloc_p2m_page which requires
> a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
> 
> This allows booting a guest using shadow and more than 6 vCPUs.

I'm routinely booting 8-vCPU guests without issues.

> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
>      uint32_t *e;
>      int rv = 0;
>      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> +    /*
> +     * Required minimum amount of pool pages plus 4MB. This is required so the
> +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
> +     */
> +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;

sh_min_allocation() also takes the memory size of the domain into
account. Aren't you therefore risking to regress larger guests by
instead using a fixed amount here? The more that ...

> @@ -2719,10 +2724,10 @@ int shadow_enable(struct domain *d, u32 mode)
>  
>      /* Init the shadow memory allocation if the user hasn't done so */
>      old_pages = d->arch.paging.shadow.total_pages;
> -    if ( old_pages < sh_min_allocation(d) + d->arch.paging.shadow.p2m_pages )
> +    if ( old_pages < min_pages )

... the right side of the + here goes away as well.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-05 10:54         ` Andrew Cooper
@ 2019-02-05 11:37           ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 11:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Wei Liu, Roger Pau Monne

>>> On 05.02.19 at 11:54, <andrew.cooper3@citrix.com> wrote:
> On 05/02/2019 07:42, Jan Beulich wrote:
>>>>> On 04.02.19 at 18:11, <roger.pau@citrix.com> wrote:
>>> On Mon, Feb 04, 2019 at 09:41:34AM -0700, Jan Beulich wrote:
>>>>>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>>>>> Due to the recent changes in the iommu mapping logic, the start
>>>>> addresses provided need to be aligned to the order intended to be
>>>>> mapped.
>>>> Irrespective of your reply to Wei's similar request (where you've
>>>> provided links to mails showing crashes) I'd like you to explain
>>>> this better. This is in particular because I don't really see what
>>>> "recent changes in the iommu mapping logic" you talk about.
>>> Commit 725bf00a87f ("iommu / p2m: add a page_order parameter to
>>> iommu_map/unmap_page()...") added the following two asserts to
>>> iommu_map:
>>>
>>> ASSERT(IS_ALIGNED(dfn_x(dfn), (1ul << page_order)));
>>> ASSERT(IS_ALIGNED(mfn_x(mfn), (1ul << page_order)));
>>>
>>> Previously iommu_map would add unaligned entries without complaining,
>>> but now in debug builds the assert will trigger.
>> Right, but the assertions were added to ensure expected behavior,
>> not to change anything.
> 
> No - this isn't reasonable.
> 
> Those assertions were added "because noone should be violating them".
> 
> As it turns out, that expectation was false.  There are real codepaths
> which do trip this assert, which functioned correctly before.
> 
> The two options are to either to bugfix the assertion failures by
> removing the assertions, or do some code improvement to update callers
> to be consistent with the new, different, expectation.
> 
> Roger was correct to being with.  The IOMMU code has recently shifted
> expectations, in a way which currently malfunctions only in debug builds.

I'm sorry, but no, I can't agree with such a position: Be it P2M or
IOMMU, requesting larger order mappings with frame numbers not
suitably aligned for the requested order is a mistake. It is the very
nature of "order" passed to both allocation and mapping functions
that returned blocks or memory or produced mappings adhere to
the requested order in terms of both size and alignment. (Was it
you or someone else who recently asked whether an allocation
with order > 0 would produce an order-aligned chunk of memory?)
For anything else it should indeed be a count rather than an order
to be passed in.

It's a second mistake for any code to have accepted such inputs.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address
  2019-02-05 10:26         ` Roger Pau Monné
@ 2019-02-05 11:38           ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 11:38 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 05.02.19 at 11:26, <roger.pau@citrix.com> wrote:
> Would you be fine with the following message:
> 
> "
> The p2m and iommu mapping code always had the requirement that
> addresses and orders must be aligned when populating the p2m or the
> iommu page tables.
> 
> PVH dom0 builder didn't take this requirement into account, and can
> call into the p2m/iommu mapping helpers with addresses and orders that
> are not aligned.
> 
> Fix this by making sure the orders passed to the physmap population
> helpers are always aligned to the guest address to be populated.
> "

LGTM, but it looks there's wider disagreement (as per the other
sub-thread).

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 11:21   ` Jan Beulich
@ 2019-02-05 11:47     ` Roger Pau Monné
  2019-02-05 12:55       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 11:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On Tue, Feb 05, 2019 at 04:21:52AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > Current code in shadow_enable will allocate a shadow pool of 4MB
> > regardless of the values of sh_min_allocation or
> > shadow_min_acceptable_pages, which means that calls to
> > shadow_alloc_p2m_page can fail even after the check and allocation
> > done just above.
> > 
> > Fix this by always checking that the pool is big enough so the rest of
> > the shadow_init function cannot fail due to lack of pages in the
> > shadow pool. This is relevant to shadow_alloc_p2m_page which requires
> > a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
> > 
> > This allows booting a guest using shadow and more than 6 vCPUs.
> 
> I'm routinely booting 8-vCPU guests without issues.

For me the following simple example with 8 vcpus doesn't work:

# cat test.cfg
name = "test"
type = "hvm"

memory = 256
vcpus = 8
hap = 0
# xl create test.cfg
Parsing config from test.cfg
libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: Cannot allocate memory
libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: -3

> 
> > --- a/xen/arch/x86/mm/shadow/common.c
> > +++ b/xen/arch/x86/mm/shadow/common.c
> > @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
> >      uint32_t *e;
> >      int rv = 0;
> >      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> > +    /*
> > +     * Required minimum amount of pool pages plus 4MB. This is required so the
> > +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
> > +     */
> > +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
> 
> sh_min_allocation() also takes the memory size of the domain into
> account. Aren't you therefore risking to regress larger guests by
> instead using a fixed amount here? The more that ...

shadow_enabled is called by domain_create, and at this point the
memory size of the guest is not yet known AFAICT. I assume the
toolstack will make further hypercalls to set a suitable sized shadow
memory pool after the domain has been created and before populating
the physmap.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-05 10:40         ` Roger Pau Monné
@ 2019-02-05 12:44           ` Jan Beulich
  2019-02-05 13:38             ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 12:44 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

>>> On 05.02.19 at 11:40, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
>> >>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
>> > On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
>> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> >> > The assert was originally added to make sure that higher order
>> >> > regions (> PAGE_ORDER_4K) could not be used to bypass the
>> >> > mmio_ro_ranges check performed by p2m_type_to_flags.
>> >> > 
>> >> > This however is already checked in set_mmio_p2m_entry, which makes
>> >> > sure that higher order mappings don't overlap with mmio_ro_ranges,
>> >> > thus allowing the creation of high order MMIO mappings safely.
>> >> 
>> >> Well, the assertions were added to make sure no other code
>> >> path appears that violates this requirement. Arguably e.g.
>> >> set_identity_p2m_entry() could gain an order parameter and
>> >> then try to establish larger p2m_mmio_direct entries.
>> >> 
>> >> Don't get me wrong, I don't object to the removal of the
>> >> assertions, but the description makes it sound as if they were
>> >> entirely redundant. Even better would be though if they
>> >> could be extended to keep triggering in "bad" cases.
>> > 
>> > I could add something like:
>> > 
>> > ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
>> > 
>> > I think this should be safe and would trigger in case of misuse.
>> 
>> Looks okay, if slightly extended (or made conditional) to exclude
>> the addition of MB(2) to MFN_INVALID to wrap and potentially
>> hit a r/o range in the low 1Mb.
> 
> Ack, so it would be:
> 
> ASSERT(mfn_eq(mfn, INVALID_MFN) ||
>        !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));

But that's still dropping the other aspect of the original ASSERT():

>> >> > --- a/xen/arch/x86/mm/p2m-pt.c
>> >> > +++ b/xen/arch/x86/mm/p2m-pt.c
>> >> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
>> >> >          }
>> >> >  
>> >> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
>> >> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);

It also made sure that "valid" MFNs can't be used for mappings with
p2m_mmio_direct type. Except that I realize now that this is wrong in
certain cases, because MMIO pages may actually have "valid" MFNs.
mfn_valid(), after all, only tells us whether there's a struct page_info
for the MFN. I wonder if it's really this brokenness that you hit,
rather than what is explained in the description.

When the assertion was introduced, MMIO wasn't handled by the
code correctly anyway (!mfn_valid() MFNs would not have got any
mappings at all in the 2M and 1G paths), whereas now we have
p2m_allows_invalid_mfn() there. So the situation has become worse
with other nearby changes. As a result I think we want to correct
the assertion here alongside the addition of what you suggest
above. What about

    if ( p2mt != p2m_mmio_direct )
        ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
               p2m_allows_invalid_mfn(p2mt)));
    else
        ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
               !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
                                        mfn_x(mfn) + PFN_DOWN(MB(2))));

? This could be further improved by checking the page owner to
(not) be dom_io, but doing so perhaps goes too far at this point.

>> >> >          l2e_content = mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt)
>> >> >              ? p2m_l2e_from_pfn(mfn_x(mfn),
>> >> >                                 p2m_type_to_flags(p2m, p2mt, mfn, 1))
>> >> 
>> >> There's a similar check in the 1G mapping logic several lines up. Why
>> >> does that not also need removing?
>> > 
>> > So far mmio_order doesn't allow creation of 1G entries for mmio
>> > regions, that's why I haven't removed that assert. I can however
>> > replace it with the assert suggested above properly adjusted for 1G
>> > pages.
>> 
>> Yes, this or explicitly say in the description why you leave alone the
>> other one.
> 
> I think changing the assert is likely more future proof.

I agree, but I didn't want to omit the alternative.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-02-05 11:15     ` Roger Pau Monné
@ 2019-02-05 12:49       ` Jan Beulich
  2019-02-05 14:01         ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 12:49 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Juergen Gross, xen-devel

>>> On 05.02.19 at 12:15, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
>> > page tables, but due to this being done before setting up the IOMMU
>> > the non RAM regions in those areas are not added to the IOMMU page
>> > tables. Fix this by making sure any reserved regions below 1MB are
>> > added to the IOMMU page tables.
>> 
>> So what was the reason again that we call iommu_hwdom_init() after
>> pvh_setup_p2m()? Am I mis-remembering there having been a patch
>> to flip their order?
> 
> Yes - IIRC I found (broken) hardware that requires the iommu page
> tables to also contain certain RAM regions or else you get page faults
> or a complete system freeze when the iommu is enabled.
> 
> It could be argued that even with this workaround such hardware is
> still likely to be broken anyway, because RAM regions are not identity
> mapped. From the emails I can find I was only able to observe this
> behavior with pre-Haswell hardware.
> 
> Maybe it would be easier to just enable the iommu before populating
> the RAM regions in the p2m? That would simplify the code here.

When, hence my ordering question.

>> > --- a/xen/drivers/passthrough/x86/iommu.c
>> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>> >       * inclusive mapping additionally maps in every pfn up to 4GB except those
>> >       * that fall in unusable ranges for PV Dom0.
>> >       */
>> > -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
>> > -         /*
>> > -          * Ignore any address below 1MB, that's already identity mapped by the
>> > -          * Dom0 builder for HVM.
>> > -          */
>> > -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
>> 
>> There was a domain ID check here, and the comment explicitly said
>> Dom0.
>> 
>> > +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>> >          return false;
>> >  
>> >      switch ( type = page_get_ram_type(mfn) )
>> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>> >          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
>> >              continue;
>> >  
>> > -        if ( paging_mode_translate(d) )
>> > +        /*
>> > +         * Don't add any address below 1MB to the HAP page tables, that's
>> > +         * already done by the domain builder. Add addresses below 1MB to the
>> > +         * IOMMU page tables only.
>> > +         */
>> > +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
>> 
>> Nothing like this here. Did you determine that in the late hwdom
>> case things work without that extra precaution (i.e. the removed
>> check was really pointless)? If so, mentioning this would be helpful
>> (at the very least to be sure this was intentional).
> 
> We don't currently have support for a pvh late-hwdom AFAIK, and
> whether this check is necessary or not depends on how such pvh
> late-hwdom is built, explicitly how the low 1MB is handled.

Well, till now I've been assuming that the late hwdom (in the PV case)
would be built using the normal tool stack logic. I would then extend
this to PVH, and expect Xen to take care of the delta between what
the tool stack does and what the hardware domain needs.

> Maybe it's better to just forget about the pre-haswell workarounds and
> enable the iommu before populating the p2m, that would certainly
> simply the code here by removing the low 1MB special casing.

Are you convinced that those workarounds are attributable to the
CPU family, and that hence with Haswell and newer they're gone
altogether?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 11:47     ` Roger Pau Monné
@ 2019-02-05 12:55       ` Jan Beulich
  2019-02-05 13:52         ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 12:55 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

>>> On 05.02.19 at 12:47, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 04:21:52AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > Current code in shadow_enable will allocate a shadow pool of 4MB
>> > regardless of the values of sh_min_allocation or
>> > shadow_min_acceptable_pages, which means that calls to
>> > shadow_alloc_p2m_page can fail even after the check and allocation
>> > done just above.
>> > 
>> > Fix this by always checking that the pool is big enough so the rest of
>> > the shadow_init function cannot fail due to lack of pages in the
>> > shadow pool. This is relevant to shadow_alloc_p2m_page which requires
>> > a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
>> > 
>> > This allows booting a guest using shadow and more than 6 vCPUs.
>> 
>> I'm routinely booting 8-vCPU guests without issues.
> 
> For me the following simple example with 8 vcpus doesn't work:
> 
> # cat test.cfg
> name = "test"
> type = "hvm"
> 
> memory = 256

I admit I've never tried this small a guest with ...

> vcpus = 8

... this many vCPU-s.

> hap = 0
> # xl create test.cfg
> Parsing config from test.cfg
> libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> Cannot allocate memory
> libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
> -3
> 
>> 
>> > --- a/xen/arch/x86/mm/shadow/common.c
>> > +++ b/xen/arch/x86/mm/shadow/common.c
>> > @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
>> >      uint32_t *e;
>> >      int rv = 0;
>> >      struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> > +    /*
>> > +     * Required minimum amount of pool pages plus 4MB. This is required so the
>> > +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
>> > +     */
>> > +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
>> 
>> sh_min_allocation() also takes the memory size of the domain into
>> account. Aren't you therefore risking to regress larger guests by
>> instead using a fixed amount here? The more that ...
> 
> shadow_enabled is called by domain_create, and at this point the
> memory size of the guest is not yet known AFAICT. I assume the
> toolstack will make further hypercalls to set a suitable sized shadow
> memory pool after the domain has been created and before populating
> the physmap.

Hmm, good point, and no, I don't think there are subsequent calls
to shadow_enable(); at least I can't find an invocation of
XEN_DOMCTL_SHADOW_OP_ENABLE.

But then the correct course of action would be to suitably grow the
shadow pool as memory gets added to the domain (be it Dom0 or
a DomU). Sticking to a fixed value of 1024 can't very well be the
best course of action in all possible cases.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-05 12:44           ` Jan Beulich
@ 2019-02-05 13:38             ` Roger Pau Monné
  2019-02-05 14:55               ` Jan Beulich
  2019-02-07 17:21               ` George Dunlap
  0 siblings, 2 replies; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 13:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 11:40, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
> >> >>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
> >> > On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
> >> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> >> > The assert was originally added to make sure that higher order
> >> >> > regions (> PAGE_ORDER_4K) could not be used to bypass the
> >> >> > mmio_ro_ranges check performed by p2m_type_to_flags.
> >> >> > 
> >> >> > This however is already checked in set_mmio_p2m_entry, which makes
> >> >> > sure that higher order mappings don't overlap with mmio_ro_ranges,
> >> >> > thus allowing the creation of high order MMIO mappings safely.
> >> >> 
> >> >> Well, the assertions were added to make sure no other code
> >> >> path appears that violates this requirement. Arguably e.g.
> >> >> set_identity_p2m_entry() could gain an order parameter and
> >> >> then try to establish larger p2m_mmio_direct entries.
> >> >> 
> >> >> Don't get me wrong, I don't object to the removal of the
> >> >> assertions, but the description makes it sound as if they were
> >> >> entirely redundant. Even better would be though if they
> >> >> could be extended to keep triggering in "bad" cases.
> >> > 
> >> > I could add something like:
> >> > 
> >> > ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> >> > 
> >> > I think this should be safe and would trigger in case of misuse.
> >> 
> >> Looks okay, if slightly extended (or made conditional) to exclude
> >> the addition of MB(2) to MFN_INVALID to wrap and potentially
> >> hit a r/o range in the low 1Mb.
> > 
> > Ack, so it would be:
> > 
> > ASSERT(mfn_eq(mfn, INVALID_MFN) ||
> >        !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> 
> But that's still dropping the other aspect of the original ASSERT():
> 
> >> >> > --- a/xen/arch/x86/mm/p2m-pt.c
> >> >> > +++ b/xen/arch/x86/mm/p2m-pt.c
> >> >> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
> >> >> >          }
> >> >> >  
> >> >> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
> >> >> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> 
> It also made sure that "valid" MFNs can't be used for mappings with
> p2m_mmio_direct type. Except that I realize now that this is wrong in
> certain cases, because MMIO pages may actually have "valid" MFNs.
> mfn_valid(), after all, only tells us whether there's a struct page_info
> for the MFN. I wonder if it's really this brokenness that you hit,
> rather than what is explained in the description.
> 
> When the assertion was introduced, MMIO wasn't handled by the
> code correctly anyway (!mfn_valid() MFNs would not have got any
> mappings at all in the 2M and 1G paths), whereas now we have
> p2m_allows_invalid_mfn() there. So the situation has become worse
> with other nearby changes. As a result I think we want to correct
> the assertion here alongside the addition of what you suggest
> above. What about
> 
>     if ( p2mt != p2m_mmio_direct )
>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
>                p2m_allows_invalid_mfn(p2mt)));
>     else
>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));

I would write it as 'if ( p2mt == p2m_mmio_direct ) ... else ...' but
apart from that LGTM. If you are fine with this adjustment I will
change it in preparation for v2.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 12:55       ` Jan Beulich
@ 2019-02-05 13:52         ` Roger Pau Monné
  2019-02-05 15:15           ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 13:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On Tue, Feb 05, 2019 at 05:55:50AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 12:47, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 04:21:52AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> > Current code in shadow_enable will allocate a shadow pool of 4MB
> >> > regardless of the values of sh_min_allocation or
> >> > shadow_min_acceptable_pages, which means that calls to
> >> > shadow_alloc_p2m_page can fail even after the check and allocation
> >> > done just above.
> >> > 
> >> > Fix this by always checking that the pool is big enough so the rest of
> >> > the shadow_init function cannot fail due to lack of pages in the
> >> > shadow pool. This is relevant to shadow_alloc_p2m_page which requires
> >> > a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
> >> > 
> >> > This allows booting a guest using shadow and more than 6 vCPUs.
> >> 
> >> I'm routinely booting 8-vCPU guests without issues.
> > 
> > For me the following simple example with 8 vcpus doesn't work:
> > 
> > # cat test.cfg
> > name = "test"
> > type = "hvm"
> > 
> > memory = 256
> 
> I admit I've never tried this small a guest with ...
> 
> > vcpus = 8
> 
> ... this many vCPU-s.

I don't think the amount of guest memory matters here, the following
example with 8G of RAM and 8 vCPUs fails in the same way:

# cat test.c
test.c       test.c.gcov  test.cfg     test.core
root@:~ # cat test.cfg
name = "test"
type = "hvm"

memory = 8192
vcpus = 8
hap = 0
# xl create test.cfg
Parsing config from test.cfg
libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: Cannot allocate memory
libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: -3

And I think that's a perfectly suitable guest config.

> 
> > hap = 0
> > # xl create test.cfg
> > Parsing config from test.cfg
> > libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> > Cannot allocate memory
> > libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
> > -3
> > 
> >> 
> >> > --- a/xen/arch/x86/mm/shadow/common.c
> >> > +++ b/xen/arch/x86/mm/shadow/common.c
> >> > @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
> >> >      uint32_t *e;
> >> >      int rv = 0;
> >> >      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> > +    /*
> >> > +     * Required minimum amount of pool pages plus 4MB. This is required so the
> >> > +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
> >> > +     */
> >> > +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
> >> 
> >> sh_min_allocation() also takes the memory size of the domain into
> >> account. Aren't you therefore risking to regress larger guests by
> >> instead using a fixed amount here? The more that ...
> > 
> > shadow_enabled is called by domain_create, and at this point the
> > memory size of the guest is not yet known AFAICT. I assume the
> > toolstack will make further hypercalls to set a suitable sized shadow
> > memory pool after the domain has been created and before populating
> > the physmap.
> 
> Hmm, good point, and no, I don't think there are subsequent calls
> to shadow_enable(); at least I can't find an invocation of
> XEN_DOMCTL_SHADOW_OP_ENABLE.

You can expand the shadow pool using
XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, there's no need to call
XEN_DOMCTL_SHADOW_OP_ENABLE for that.

In fact I'm not sure what's the point of XEN_DOMCTL_SHADOW_OP_ENABLE,
since shadow is enabled when the domain is created (domain_create)
with the XEN_DOMCTL_createdomain domctl, and at this point the memory
size of the domain is not yet known by the hypervisor.

Maybe you can create a HAP or PV domain and turn shadow on
afterwards?

> But then the correct course of action would be to suitably grow the
> shadow pool as memory gets added to the domain (be it Dom0 or
> a DomU). Sticking to a fixed value of 1024 can't very well be the
> best course of action in all possible cases.

Right, but it turns out 1024 (4MB) is not suitable given the example
above. I'm open to other options, but IMO this needs to be fixed for
4.12 in one way or another.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-02-05 12:49       ` Jan Beulich
@ 2019-02-05 14:01         ` Roger Pau Monné
  2019-02-05 15:18           ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, xen-devel

On Tue, Feb 05, 2019 at 05:49:07AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 12:15, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> > Reserved memory ranges below 1MB on a PVH dom0 are added to the HAP
> >> > page tables, but due to this being done before setting up the IOMMU
> >> > the non RAM regions in those areas are not added to the IOMMU page
> >> > tables. Fix this by making sure any reserved regions below 1MB are
> >> > added to the IOMMU page tables.
> >> 
> >> So what was the reason again that we call iommu_hwdom_init() after
> >> pvh_setup_p2m()? Am I mis-remembering there having been a patch
> >> to flip their order?
> > 
> > Yes - IIRC I found (broken) hardware that requires the iommu page
> > tables to also contain certain RAM regions or else you get page faults
> > or a complete system freeze when the iommu is enabled.
> > 
> > It could be argued that even with this workaround such hardware is
> > still likely to be broken anyway, because RAM regions are not identity
> > mapped. From the emails I can find I was only able to observe this
> > behavior with pre-Haswell hardware.
> > 
> > Maybe it would be easier to just enable the iommu before populating
> > the RAM regions in the p2m? That would simplify the code here.
> 
> When, hence my ordering question.
> 
> >> > --- a/xen/drivers/passthrough/x86/iommu.c
> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
> >> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> >> >       * inclusive mapping additionally maps in every pfn up to 4GB except those
> >> >       * that fall in unusable ranges for PV Dom0.
> >> >       */
> >> > -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
> >> > -         /*
> >> > -          * Ignore any address below 1MB, that's already identity mapped by the
> >> > -          * Dom0 builder for HVM.
> >> > -          */
> >> > -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
> >> 
> >> There was a domain ID check here, and the comment explicitly said
> >> Dom0.
> >> 
> >> > +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
> >> >          return false;
> >> >  
> >> >      switch ( type = page_get_ram_type(mfn) )
> >> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
> >> >          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
> >> >              continue;
> >> >  
> >> > -        if ( paging_mode_translate(d) )
> >> > +        /*
> >> > +         * Don't add any address below 1MB to the HAP page tables, that's
> >> > +         * already done by the domain builder. Add addresses below 1MB to the
> >> > +         * IOMMU page tables only.
> >> > +         */
> >> > +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
> >> 
> >> Nothing like this here. Did you determine that in the late hwdom
> >> case things work without that extra precaution (i.e. the removed
> >> check was really pointless)? If so, mentioning this would be helpful
> >> (at the very least to be sure this was intentional).
> > 
> > We don't currently have support for a pvh late-hwdom AFAIK, and
> > whether this check is necessary or not depends on how such pvh
> > late-hwdom is built, explicitly how the low 1MB is handled.
> 
> Well, till now I've been assuming that the late hwdom (in the PV case)
> would be built using the normal tool stack logic. I would then extend
> this to PVH, and expect Xen to take care of the delta between what
> the tool stack does and what the hardware domain needs.

Well, I think that non-trivial changes would need to be performed to
the toolstack in order to create a pvh late-hwdom. For once the
physmap of a pvh hwdom needs to match the native one, and there's no
logic in the toolstack at all to do this.

My point is that making such adjustment here for a pvh late-hwdom is
likely a red herring (or maybe not even needed or wrong), and there's
a lot more work to do in order to be able to create a pvh
late-hwdom.

> > Maybe it's better to just forget about the pre-haswell workarounds and
> > enable the iommu before populating the p2m, that would certainly
> > simply the code here by removing the low 1MB special casing.
> 
> Are you convinced that those workarounds are attributable to the
> CPU family, and that hence with Haswell and newer they're gone
> altogether?

Not sure, I guess it's more likely part of the chipset rather the CPU
itself? But since chipsets are usually paired with CPU families, it's
quite likely the bogus chipset was only used in conjunction with
pre-Haswell CPUs.

Anyway, I'm happy to change the order so that the iommu is enabled
before the p2m is populated and then drop this workaround from the
iommu code. Would you be fine with such a change?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-05 13:38             ` Roger Pau Monné
@ 2019-02-05 14:55               ` Jan Beulich
  2019-02-07 17:21               ` George Dunlap
  1 sibling, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 14:55 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

>>> On 05.02.19 at 14:38, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
>> >>> On 05.02.19 at 11:40, <roger.pau@citrix.com> wrote:
>> > On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
>> >> >>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
>> >> > On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
>> >> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> >> >> > The assert was originally added to make sure that higher order
>> >> >> > regions (> PAGE_ORDER_4K) could not be used to bypass the
>> >> >> > mmio_ro_ranges check performed by p2m_type_to_flags.
>> >> >> > 
>> >> >> > This however is already checked in set_mmio_p2m_entry, which makes
>> >> >> > sure that higher order mappings don't overlap with mmio_ro_ranges,
>> >> >> > thus allowing the creation of high order MMIO mappings safely.
>> >> >> 
>> >> >> Well, the assertions were added to make sure no other code
>> >> >> path appears that violates this requirement. Arguably e.g.
>> >> >> set_identity_p2m_entry() could gain an order parameter and
>> >> >> then try to establish larger p2m_mmio_direct entries.
>> >> >> 
>> >> >> Don't get me wrong, I don't object to the removal of the
>> >> >> assertions, but the description makes it sound as if they were
>> >> >> entirely redundant. Even better would be though if they
>> >> >> could be extended to keep triggering in "bad" cases.
>> >> > 
>> >> > I could add something like:
>> >> > 
>> >> > ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
>> >> > 
>> >> > I think this should be safe and would trigger in case of misuse.
>> >> 
>> >> Looks okay, if slightly extended (or made conditional) to exclude
>> >> the addition of MB(2) to MFN_INVALID to wrap and potentially
>> >> hit a r/o range in the low 1Mb.
>> > 
>> > Ack, so it would be:
>> > 
>> > ASSERT(mfn_eq(mfn, INVALID_MFN) ||
>> >        !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
>> 
>> But that's still dropping the other aspect of the original ASSERT():
>> 
>> >> >> > --- a/xen/arch/x86/mm/p2m-pt.c
>> >> >> > +++ b/xen/arch/x86/mm/p2m-pt.c
>> >> >> > @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, 
> mfn_t mfn,
>> >> >> >          }
>> >> >> >  
>> >> >> >          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
>> >> >> > -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>> 
>> It also made sure that "valid" MFNs can't be used for mappings with
>> p2m_mmio_direct type. Except that I realize now that this is wrong in
>> certain cases, because MMIO pages may actually have "valid" MFNs.
>> mfn_valid(), after all, only tells us whether there's a struct page_info
>> for the MFN. I wonder if it's really this brokenness that you hit,
>> rather than what is explained in the description.
>> 
>> When the assertion was introduced, MMIO wasn't handled by the
>> code correctly anyway (!mfn_valid() MFNs would not have got any
>> mappings at all in the 2M and 1G paths), whereas now we have
>> p2m_allows_invalid_mfn() there. So the situation has become worse
>> with other nearby changes. As a result I think we want to correct
>> the assertion here alongside the addition of what you suggest
>> above. What about
>> 
>>     if ( p2mt != p2m_mmio_direct )
>>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
>>                p2m_allows_invalid_mfn(p2mt)));
>>     else
>>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));
> 
> I would write it as 'if ( p2mt == p2m_mmio_direct ) ... else ...' but
> apart from that LGTM. If you are fine with this adjustment I will
> change it in preparation for v2.

Oh, sure - what's if and what's else doesn't really matter here. You
could even use ?: inside the ASSERT() if you wanted to.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 13:52         ` Roger Pau Monné
@ 2019-02-05 15:15           ` Jan Beulich
  2019-02-05 15:53             ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 15:15 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

>>> On 05.02.19 at 14:52, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 05:55:50AM -0700, Jan Beulich wrote:
>> >>> On 05.02.19 at 12:47, <roger.pau@citrix.com> wrote:
>> > On Tue, Feb 05, 2019 at 04:21:52AM -0700, Jan Beulich wrote:
>> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> >> > Current code in shadow_enable will allocate a shadow pool of 4MB
>> >> > regardless of the values of sh_min_allocation or
>> >> > shadow_min_acceptable_pages, which means that calls to
>> >> > shadow_alloc_p2m_page can fail even after the check and allocation
>> >> > done just above.
>> >> > 
>> >> > Fix this by always checking that the pool is big enough so the rest of
>> >> > the shadow_init function cannot fail due to lack of pages in the
>> >> > shadow pool. This is relevant to shadow_alloc_p2m_page which requires
>> >> > a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
>> >> > 
>> >> > This allows booting a guest using shadow and more than 6 vCPUs.
>> >> 
>> >> I'm routinely booting 8-vCPU guests without issues.
>> > 
>> > For me the following simple example with 8 vcpus doesn't work:
>> > 
>> > # cat test.cfg
>> > name = "test"
>> > type = "hvm"
>> > 
>> > memory = 256
>> 
>> I admit I've never tried this small a guest with ...
>> 
>> > vcpus = 8
>> 
>> ... this many vCPU-s.
> 
> I don't think the amount of guest memory matters here, the following
> example with 8G of RAM and 8 vCPUs fails in the same way:
> 
> # cat test.c
> test.c       test.c.gcov  test.cfg     test.core
> root@:~ # cat test.cfg
> name = "test"
> type = "hvm"
> 
> memory = 8192
> vcpus = 8
> hap = 0
> # xl create test.cfg
> Parsing config from test.cfg
> libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> Cannot allocate memory
> libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
> -3
> 
> And I think that's a perfectly suitable guest config.

Indeed. And it doesn't seem to work for me anymore either. Must be
a regression, as I'm pretty sure it did still work not all that long ago.
Not even "shadow_memory=256" helps.

>> >> > --- a/xen/arch/x86/mm/shadow/common.c
>> >> > +++ b/xen/arch/x86/mm/shadow/common.c
>> >> > @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
>> >> >      uint32_t *e;
>> >> >      int rv = 0;
>> >> >      struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> >> > +    /*
>> >> > +     * Required minimum amount of pool pages plus 4MB. This is required so the
>> >> > +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
>> >> > +     */
>> >> > +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
>> >> 
>> >> sh_min_allocation() also takes the memory size of the domain into
>> >> account. Aren't you therefore risking to regress larger guests by
>> >> instead using a fixed amount here? The more that ...
>> > 
>> > shadow_enabled is called by domain_create, and at this point the
>> > memory size of the guest is not yet known AFAICT. I assume the
>> > toolstack will make further hypercalls to set a suitable sized shadow
>> > memory pool after the domain has been created and before populating
>> > the physmap.
>> 
>> Hmm, good point, and no, I don't think there are subsequent calls
>> to shadow_enable(); at least I can't find an invocation of
>> XEN_DOMCTL_SHADOW_OP_ENABLE.
> 
> You can expand the shadow pool using
> XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, there's no need to call
> XEN_DOMCTL_SHADOW_OP_ENABLE for that.

Ah, right. And libxl has a SET_ALLOCATION invocation.

> In fact I'm not sure what's the point of XEN_DOMCTL_SHADOW_OP_ENABLE,
> since shadow is enabled when the domain is created (domain_create)
> with the XEN_DOMCTL_createdomain domctl, and at this point the memory
> size of the domain is not yet known by the hypervisor.

I guess it served a more significant purpose in the past.

> Maybe you can create a HAP or PV domain and turn shadow on
> afterwards?

HAP - I don't think so. PV - sure, for log-dirty mode.

>> But then the correct course of action would be to suitably grow the
>> shadow pool as memory gets added to the domain (be it Dom0 or
>> a DomU). Sticking to a fixed value of 1024 can't very well be the
>> best course of action in all possible cases.
> 
> Right, but it turns out 1024 (4MB) is not suitable given the example
> above. I'm open to other options, but IMO this needs to be fixed for
> 4.12 in one way or another.

Yes, and not just for 4.12, unless this is an issue there only. Not sure
what other options you're after; I think I've said what I think would be
needed. Or maybe that remark was rather towards others?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-02-05 14:01         ` Roger Pau Monné
@ 2019-02-05 15:18           ` Jan Beulich
  2019-02-05 15:45             ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 15:18 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Juergen Gross, xen-devel

>>> On 05.02.19 at 15:01, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 05:49:07AM -0700, Jan Beulich wrote:
>> >>> On 05.02.19 at 12:15, <roger.pau@citrix.com> wrote:
>> > On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
>> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> >> > --- a/xen/drivers/passthrough/x86/iommu.c
>> >> > +++ b/xen/drivers/passthrough/x86/iommu.c
>> >> > @@ -151,12 +151,7 @@ static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
>> >> >       * inclusive mapping additionally maps in every pfn up to 4GB except those
>> >> >       * that fall in unusable ranges for PV Dom0.
>> >> >       */
>> >> > -    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) ||
>> >> > -         /*
>> >> > -          * Ignore any address below 1MB, that's already identity mapped by the
>> >> > -          * Dom0 builder for HVM.
>> >> > -          */
>> >> > -         (!d->domain_id && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) )
>> >> 
>> >> There was a domain ID check here, and the comment explicitly said
>> >> Dom0.
>> >> 
>> >> > +    if ( (pfn > max_pfn && !mfn_valid(mfn)) || xen_in_range(pfn) )
>> >> >          return false;
>> >> >  
>> >> >      switch ( type = page_get_ram_type(mfn) )
>> >> > @@ -245,7 +240,12 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>> >> >          if ( !hwdom_iommu_map(d, pfn, max_pfn) )
>> >> >              continue;
>> >> >  
>> >> > -        if ( paging_mode_translate(d) )
>> >> > +        /*
>> >> > +         * Don't add any address below 1MB to the HAP page tables, that's
>> >> > +         * already done by the domain builder. Add addresses below 1MB to the
>> >> > +         * IOMMU page tables only.
>> >> > +         */
>> >> > +        if ( paging_mode_translate(d) && pfn >= PFN_DOWN(MB(1)) )
>> >> 
>> >> Nothing like this here. Did you determine that in the late hwdom
>> >> case things work without that extra precaution (i.e. the removed
>> >> check was really pointless)? If so, mentioning this would be helpful
>> >> (at the very least to be sure this was intentional).
>> > 
>> > We don't currently have support for a pvh late-hwdom AFAIK, and
>> > whether this check is necessary or not depends on how such pvh
>> > late-hwdom is built, explicitly how the low 1MB is handled.
>> 
>> Well, till now I've been assuming that the late hwdom (in the PV case)
>> would be built using the normal tool stack logic. I would then extend
>> this to PVH, and expect Xen to take care of the delta between what
>> the tool stack does and what the hardware domain needs.
> 
> Well, I think that non-trivial changes would need to be performed to
> the toolstack in order to create a pvh late-hwdom. For once the
> physmap of a pvh hwdom needs to match the native one, and there's no
> logic in the toolstack at all to do this.
> 
> My point is that making such adjustment here for a pvh late-hwdom is
> likely a red herring (or maybe not even needed or wrong), and there's
> a lot more work to do in order to be able to create a pvh
> late-hwdom.

Okay then, as long as this gets made clear as an intentional
change in the description.

>> > Maybe it's better to just forget about the pre-haswell workarounds and
>> > enable the iommu before populating the p2m, that would certainly
>> > simply the code here by removing the low 1MB special casing.
>> 
>> Are you convinced that those workarounds are attributable to the
>> CPU family, and that hence with Haswell and newer they're gone
>> altogether?
> 
> Not sure, I guess it's more likely part of the chipset rather the CPU
> itself? But since chipsets are usually paired with CPU families, it's
> quite likely the bogus chipset was only used in conjunction with
> pre-Haswell CPUs.

I'd expect it's largely the firmware screwing things up.

> Anyway, I'm happy to change the order so that the iommu is enabled
> before the p2m is populated and then drop this workaround from the
> iommu code. Would you be fine with such a change?

Personally I would be, but if the implication would that PVH won't
work on pre-Haswell anymore, then I think this can't be settled
just between the two of us.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables
  2019-02-05 15:18           ` Jan Beulich
@ 2019-02-05 15:45             ` Roger Pau Monné
  0 siblings, 0 replies; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 15:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, xen-devel

On Tue, Feb 05, 2019 at 08:18:52AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 15:01, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 05:49:07AM -0700, Jan Beulich wrote:
> >> >>> On 05.02.19 at 12:15, <roger.pau@citrix.com> wrote:
> >> > On Tue, Feb 05, 2019 at 03:47:49AM -0700, Jan Beulich wrote:
> >> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> > Maybe it's better to just forget about the pre-haswell workarounds and
> >> > enable the iommu before populating the p2m, that would certainly
> >> > simply the code here by removing the low 1MB special casing.
> >> 
> >> Are you convinced that those workarounds are attributable to the
> >> CPU family, and that hence with Haswell and newer they're gone
> >> altogether?
> > 
> > Not sure, I guess it's more likely part of the chipset rather the CPU
> > itself? But since chipsets are usually paired with CPU families, it's
> > quite likely the bogus chipset was only used in conjunction with
> > pre-Haswell CPUs.
> 
> I'd expect it's largely the firmware screwing things up.
> 
> > Anyway, I'm happy to change the order so that the iommu is enabled
> > before the p2m is populated and then drop this workaround from the
> > iommu code. Would you be fine with such a change?
> 
> Personally I would be, but if the implication would that PVH won't
> work on pre-Haswell anymore, then I think this can't be settled
> just between the two of us.

I no longer have that pre-Haswell system, and when I reported
this to Intel the response I got was that they where unable to
reproduce the issue, so it might be something quite specific to my
test box.

My suggestion would be to switch to the normal flow (iommu init first,
then populate p2m RAM regions) and wait for complains. ATM I have no
way to assert that the current code would boot on that broken system
anyway, and PVH dom0 is still experimental so people playing with it
should be expecting to find issues and test fixes.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 15:15           ` Jan Beulich
@ 2019-02-05 15:53             ` Roger Pau Monné
  2019-02-05 17:32               ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-05 15:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On Tue, Feb 05, 2019 at 08:15:27AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 14:52, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 05:55:50AM -0700, Jan Beulich wrote:
> >> >>> On 05.02.19 at 12:47, <roger.pau@citrix.com> wrote:
> >> > On Tue, Feb 05, 2019 at 04:21:52AM -0700, Jan Beulich wrote:
> >> >> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >> >> > Current code in shadow_enable will allocate a shadow pool of 4MB
> >> >> > regardless of the values of sh_min_allocation or
> >> >> > shadow_min_acceptable_pages, which means that calls to
> >> >> > shadow_alloc_p2m_page can fail even after the check and allocation
> >> >> > done just above.
> >> >> > 
> >> >> > Fix this by always checking that the pool is big enough so the rest of
> >> >> > the shadow_init function cannot fail due to lack of pages in the
> >> >> > shadow pool. This is relevant to shadow_alloc_p2m_page which requires
> >> >> > a minimum amount of shadow_min_acceptable_pages(d) + 1 in the pool.
> >> >> > 
> >> >> > This allows booting a guest using shadow and more than 6 vCPUs.
> >> >> 
> >> >> I'm routinely booting 8-vCPU guests without issues.
> >> > 
> >> > For me the following simple example with 8 vcpus doesn't work:
> >> > 
> >> > # cat test.cfg
> >> > name = "test"
> >> > type = "hvm"
> >> > 
> >> > memory = 256
> >> 
> >> I admit I've never tried this small a guest with ...
> >> 
> >> > vcpus = 8
> >> 
> >> ... this many vCPU-s.
> > 
> > I don't think the amount of guest memory matters here, the following
> > example with 8G of RAM and 8 vCPUs fails in the same way:
> > 
> > # cat test.c
> > test.c       test.c.gcov  test.cfg     test.core
> > root@:~ # cat test.cfg
> > name = "test"
> > type = "hvm"
> > 
> > memory = 8192
> > vcpus = 8
> > hap = 0
> > # xl create test.cfg
> > Parsing config from test.cfg
> > libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> > Cannot allocate memory
> > libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
> > -3
> > 
> > And I think that's a perfectly suitable guest config.
> 
> Indeed. And it doesn't seem to work for me anymore either. Must be
> a regression, as I'm pretty sure it did still work not all that long ago.
> Not even "shadow_memory=256" helps.

No, because shadow_init is called from domain_create, and it's
impossible to increase the shadow memory pool before the domain is
actually created.

> 
> >> >> > --- a/xen/arch/x86/mm/shadow/common.c
> >> >> > +++ b/xen/arch/x86/mm/shadow/common.c
> >> >> > @@ -2705,6 +2705,11 @@ int shadow_enable(struct domain *d, u32 mode)
> >> >> >      uint32_t *e;
> >> >> >      int rv = 0;
> >> >> >      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> >> >> > +    /*
> >> >> > +     * Required minimum amount of pool pages plus 4MB. This is required so the
> >> >> > +     * calls to p2m_alloc_table and shadow_alloc_p2m_page below don't fail.
> >> >> > +     */
> >> >> > +    unsigned int min_pages = shadow_min_acceptable_pages(d) + 1024;
> >> >> 
> >> >> sh_min_allocation() also takes the memory size of the domain into
> >> >> account. Aren't you therefore risking to regress larger guests by
> >> >> instead using a fixed amount here? The more that ...
> >> > 
> >> > shadow_enabled is called by domain_create, and at this point the
> >> > memory size of the guest is not yet known AFAICT. I assume the
> >> > toolstack will make further hypercalls to set a suitable sized shadow
> >> > memory pool after the domain has been created and before populating
> >> > the physmap.
> >> 
> >> Hmm, good point, and no, I don't think there are subsequent calls
> >> to shadow_enable(); at least I can't find an invocation of
> >> XEN_DOMCTL_SHADOW_OP_ENABLE.
> > 
> > You can expand the shadow pool using
> > XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION, there's no need to call
> > XEN_DOMCTL_SHADOW_OP_ENABLE for that.
> 
> Ah, right. And libxl has a SET_ALLOCATION invocation.
> 
> > In fact I'm not sure what's the point of XEN_DOMCTL_SHADOW_OP_ENABLE,
> > since shadow is enabled when the domain is created (domain_create)
> > with the XEN_DOMCTL_createdomain domctl, and at this point the memory
> > size of the domain is not yet known by the hypervisor.
> 
> I guess it served a more significant purpose in the past.
> 
> > Maybe you can create a HAP or PV domain and turn shadow on
> > afterwards?
> 
> HAP - I don't think so. PV - sure, for log-dirty mode.
> 
> >> But then the correct course of action would be to suitably grow the
> >> shadow pool as memory gets added to the domain (be it Dom0 or
> >> a DomU). Sticking to a fixed value of 1024 can't very well be the
> >> best course of action in all possible cases.
> > 
> > Right, but it turns out 1024 (4MB) is not suitable given the example
> > above. I'm open to other options, but IMO this needs to be fixed for
> > 4.12 in one way or another.
> 
> Yes, and not just for 4.12, unless this is an issue there only. Not sure
> what other options you're after; I think I've said what I think would be
> needed. Or maybe that remark was rather towards others?

I'm afraid I don't follow. You mention "grow the shadow pool as memory
gets added to the domain", but here the problem is that shadow_init
when called from domain_create (so a domain with 0 memory assigned)
already fails. I cannot see how that can be fixed by adding more
memory to the shadow pool when memory gets added to the domain, since
the domain has exactly 0 memory assigned when the error happens.

This is IMO an issue with the default size of the memory pool
allocated by shadow_init, or by shadow_alloc_p2m_page being to picky
about the memory it requires.

AFAICT this can either be fixed by increasing the initial shadow
memory pool (as done by this patch) or by relaxing the checks in
shadow_alloc_p2m_page, but increasing the shadow pool as memory gets
added to the domain is not going to solve it.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 15:53             ` Roger Pau Monné
@ 2019-02-05 17:32               ` Jan Beulich
  2019-02-06  9:10                 ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-05 17:32 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

>>> On 05.02.19 at 16:53, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 08:15:27AM -0700, Jan Beulich wrote:
>> >>> On 05.02.19 at 14:52, <roger.pau@citrix.com> wrote:
>> > I don't think the amount of guest memory matters here, the following
>> > example with 8G of RAM and 8 vCPUs fails in the same way:
>> > 
>> > # cat test.c
>> > test.c       test.c.gcov  test.cfg     test.core
>> > root@:~ # cat test.cfg
>> > name = "test"
>> > type = "hvm"
>> > 
>> > memory = 8192
>> > vcpus = 8
>> > hap = 0
>> > # xl create test.cfg
>> > Parsing config from test.cfg
>> > libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
>> > Cannot allocate memory
>> > libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
>> > -3
>> > 
>> > And I think that's a perfectly suitable guest config.
>> 
>> Indeed. And it doesn't seem to work for me anymore either. Must be
>> a regression, as I'm pretty sure it did still work not all that long ago.
>> Not even "shadow_memory=256" helps.
> 
> No, because shadow_init is called from domain_create, and it's
> impossible to increase the shadow memory pool before the domain is
> actually created.

Okay, I misunderstood the problem initially. Aiui this is a
regression from the early setting of ->max_vcpus, as that now
causes shadow_min_acceptable_pages() to block far more pages
than it did before from use for p2m allocs during domain creation.
I think I see an alternative way of fixing this for the moment
(without adding re-size logic yet when d->tot_pages grows), but
this will have to wait until tomorrow.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-05 17:32               ` Jan Beulich
@ 2019-02-06  9:10                 ` Roger Pau Monné
  2019-02-06 11:02                   ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-06  9:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On Tue, Feb 05, 2019 at 10:32:08AM -0700, Jan Beulich wrote:
> >>> On 05.02.19 at 16:53, <roger.pau@citrix.com> wrote:
> > On Tue, Feb 05, 2019 at 08:15:27AM -0700, Jan Beulich wrote:
> >> >>> On 05.02.19 at 14:52, <roger.pau@citrix.com> wrote:
> >> > I don't think the amount of guest memory matters here, the following
> >> > example with 8G of RAM and 8 vCPUs fails in the same way:
> >> > 
> >> > # cat test.c
> >> > test.c       test.c.gcov  test.cfg     test.core
> >> > root@:~ # cat test.cfg
> >> > name = "test"
> >> > type = "hvm"
> >> > 
> >> > memory = 8192
> >> > vcpus = 8
> >> > hap = 0
> >> > # xl create test.cfg
> >> > Parsing config from test.cfg
> >> > libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> >> > Cannot allocate memory
> >> > libxl: error: libxl_create.c:975:initiate_domain_create: cannot make domain: 
> >> > -3
> >> > 
> >> > And I think that's a perfectly suitable guest config.
> >> 
> >> Indeed. And it doesn't seem to work for me anymore either. Must be
> >> a regression, as I'm pretty sure it did still work not all that long ago.
> >> Not even "shadow_memory=256" helps.
> > 
> > No, because shadow_init is called from domain_create, and it's
> > impossible to increase the shadow memory pool before the domain is
> > actually created.
> 
> Okay, I misunderstood the problem initially. Aiui this is a
> regression from the early setting of ->max_vcpus, as that now

Ahh, I wasn't able to figure out what caused this regression, it's
indeed caused by max_vcpus being set at domain_create.

> causes shadow_min_acceptable_pages() to block far more pages
> than it did before from use for p2m allocs during domain creation.
> I think I see an alternative way of fixing this for the moment
> (without adding re-size logic yet when d->tot_pages grows), but
> this will have to wait until tomorrow.

Ack, I'm happy to implement it if you tell me the plan :).

Maybe as an alternative the checks in shadow_alloc_p2m_page can be
relaxed during domain creation, or the p2m allocated when the first
memory page gets added to the domain?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail
  2019-02-06  9:10                 ` Roger Pau Monné
@ 2019-02-06 11:02                   ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-06 11:02 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

>>> On 06.02.19 at 10:10, <roger.pau@citrix.com> wrote:
> On Tue, Feb 05, 2019 at 10:32:08AM -0700, Jan Beulich wrote:
>> >>> On 05.02.19 at 16:53, <roger.pau@citrix.com> wrote:
>> > On Tue, Feb 05, 2019 at 08:15:27AM -0700, Jan Beulich wrote:
>> >> >>> On 05.02.19 at 14:52, <roger.pau@citrix.com> wrote:
>> >> > I don't think the amount of guest memory matters here, the following
>> >> > example with 8G of RAM and 8 vCPUs fails in the same way:
>> >> > 
>> >> > # cat test.c
>> >> > test.c       test.c.gcov  test.cfg     test.core
>> >> > root@:~ # cat test.cfg
>> >> > name = "test"
>> >> > type = "hvm"
>> >> > 
>> >> > memory = 8192
>> >> > vcpus = 8
>> >> > hap = 0
>> >> > # xl create test.cfg
>> >> > Parsing config from test.cfg
>> >> > libxl: error: libxl_create.c:578:libxl__domain_make: domain creation fail: 
> 
>> >> > Cannot allocate memory
>> >> > libxl: error: libxl_create.c:975:initiate_domain_create: cannot make 
> domain: 
>> >> > -3
>> >> > 
>> >> > And I think that's a perfectly suitable guest config.
>> >> 
>> >> Indeed. And it doesn't seem to work for me anymore either. Must be
>> >> a regression, as I'm pretty sure it did still work not all that long ago.
>> >> Not even "shadow_memory=256" helps.
>> > 
>> > No, because shadow_init is called from domain_create, and it's
>> > impossible to increase the shadow memory pool before the domain is
>> > actually created.
>> 
>> Okay, I misunderstood the problem initially. Aiui this is a
>> regression from the early setting of ->max_vcpus, as that now
> 
> Ahh, I wasn't able to figure out what caused this regression, it's
> indeed caused by max_vcpus being set at domain_create.
> 
>> causes shadow_min_acceptable_pages() to block far more pages
>> than it did before from use for p2m allocs during domain creation.
>> I think I see an alternative way of fixing this for the moment
>> (without adding re-size logic yet when d->tot_pages grows), but
>> this will have to wait until tomorrow.
> 
> Ack, I'm happy to implement it if you tell me the plan :).

Well, the plan was hard to spell out without actually trying out
what is needed. You'll likely have seen already the resulting
patch I've sent out for discussion.

> Maybe as an alternative the checks in shadow_alloc_p2m_page can be
> relaxed during domain creation,

I've been considering this, but decided against it, because it
would undermine the functionality in case the tool stack would
not issue a set-allocation domctl. Mid-term (perhaps after 4.12)
I think we need to settle on a more uniform model here, e.g.
either always requiring a set-allocation request by the tool
stack, or making the code not depend on it at all.

> or the p2m allocated when the first memory page gets added to the domain?

This might be possible as well, but perhaps requires a more
intrusive change, in particular if you take into consideration
the VMX change which was a byproduct of the playing done
for this one.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-01-30 10:36 ` [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value Roger Pau Monne
@ 2019-02-06 13:54   ` Jan Beulich
  2019-02-07 15:39     ` Roger Pau Monné
  2019-02-07 17:09   ` George Dunlap
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-06 13:54 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Juergen Gross, Andrew Cooper, Wei Liu, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> There have been several reports of the dom0 builder running out of
> memory when buildign a PVH dom0 without havingf specified a dom0_mem

"building" and "having"

> value. Print a warning message if dom0_mem is not set to a fixed value
> when booting in PVH mode.

Why does it need to be a fixed value? I.e. why can't you simply
put this warning next to where the default gets established,
when nr_pages is zero?

> --- a/xen/arch/x86/dom0_build.c
> +++ b/xen/arch/x86/dom0_build.c
> @@ -344,6 +344,10 @@ unsigned long __init dom0_compute_nr_pages(
>      if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
>          parse_dom0_mem(CONFIG_DOM0_MEM);
>  
> +    if ( is_hvm_domain(d) && !dom0_size.nr_pages )
> +        printk(
> +"WARNING: consider setting dom0_mem to a fixed value when using PVH mode\n");

Pretty unusual indentation. Is there any reason for you doing so?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-01-30 10:36 ` [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify Roger Pau Monne
@ 2019-02-06 16:59   ` Jan Beulich
  2019-02-07 16:53     ` Roger Pau Monné
  2019-02-07 17:49   ` George Dunlap
  1 sibling, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-06 16:59 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>                                                unsigned int *flags);
>  
>  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> -                                    p2m_type_t ot, unsigned int level)
> +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
> +                                    unsigned int level)
>  {
> -    if ( level != 1 || nt == ot )
> +    struct page_info *pg;
> +
> +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
>          return;
>  
>      switch ( nt )
> @@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
>          p2m->ioreq.entry_count++;
>          break;
>  
> +    case p2m_map_foreign:
> +        pg = mfn_to_page(nfn);
> +
> +        if ( !pg || !page_get_owner_and_reference(pg) )

mfn_to_page() can't return NULL, can it? You may want to ASSERT()
beforehand that the MFN is not INVALID_MFN, though.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-02-06 13:54   ` Jan Beulich
@ 2019-02-07 15:39     ` Roger Pau Monné
  2019-02-07 16:47       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-07 15:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, Andrew Cooper, Wei Liu, xen-devel

On Wed, Feb 06, 2019 at 06:54:23AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > There have been several reports of the dom0 builder running out of
> > memory when buildign a PVH dom0 without havingf specified a dom0_mem
> 
> "building" and "having"
> 
> > value. Print a warning message if dom0_mem is not set to a fixed value
> > when booting in PVH mode.
> 
> Why does it need to be a fixed value? I.e. why can't you simply
> put this warning next to where the default gets established,
> when nr_pages is zero?

Ack, but I guess you likely also want to change the printed warning so
it does say "fixed"?

> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -344,6 +344,10 @@ unsigned long __init dom0_compute_nr_pages(
> >      if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
> >          parse_dom0_mem(CONFIG_DOM0_MEM);
> >  
> > +    if ( is_hvm_domain(d) && !dom0_size.nr_pages )
> > +        printk(
> > +"WARNING: consider setting dom0_mem to a fixed value when using PVH mode\n");
> 
> Pretty unusual indentation. Is there any reason for you doing so?

Did it that way to avoid splitting and to attempt to keep the line as
short as possible. Would you prefer me to split the message?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-02-07 15:39     ` Roger Pau Monné
@ 2019-02-07 16:47       ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-07 16:47 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Juergen Gross, Andrew Cooper, Wei Liu, xen-devel

>>> On 07.02.19 at 16:39, <roger.pau@citrix.com> wrote:
> On Wed, Feb 06, 2019 at 06:54:23AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > There have been several reports of the dom0 builder running out of
>> > memory when buildign a PVH dom0 without havingf specified a dom0_mem
>> 
>> "building" and "having"
>> 
>> > value. Print a warning message if dom0_mem is not set to a fixed value
>> > when booting in PVH mode.
>> 
>> Why does it need to be a fixed value? I.e. why can't you simply
>> put this warning next to where the default gets established,
>> when nr_pages is zero?
> 
> Ack, but I guess you likely also want to change the printed warning so
> it does say "fixed"?

Did you mean '... so it doesn't say "fixed"'? If so - sure, the message
of course should reflect what is happening.

>> > --- a/xen/arch/x86/dom0_build.c
>> > +++ b/xen/arch/x86/dom0_build.c
>> > @@ -344,6 +344,10 @@ unsigned long __init dom0_compute_nr_pages(
>> >      if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
>> >          parse_dom0_mem(CONFIG_DOM0_MEM);
>> >  
>> > +    if ( is_hvm_domain(d) && !dom0_size.nr_pages )
>> > +        printk(
>> > +"WARNING: consider setting dom0_mem to a fixed value when using PVH mode\n");
>> 
>> Pretty unusual indentation. Is there any reason for you doing so?
> 
> Did it that way to avoid splitting and to attempt to keep the line as
> short as possible. Would you prefer me to split the message?

Well, splitting after WARNING: seems reasonable and unlikely to get
in the way of grep-ing for the message. But if you think a split there
is undesirable, then put it all on one line.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-02-06 16:59   ` Jan Beulich
@ 2019-02-07 16:53     ` Roger Pau Monné
  2019-02-07 16:59       ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-07 16:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

On Wed, Feb 06, 2019 at 09:59:30AM -0700, Jan Beulich wrote:
> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> > --- a/xen/include/asm-x86/p2m.h
> > +++ b/xen/include/asm-x86/p2m.h
> > @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >                                                unsigned int *flags);
> >  
> >  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> > -                                    p2m_type_t ot, unsigned int level)
> > +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
> > +                                    unsigned int level)
> >  {
> > -    if ( level != 1 || nt == ot )
> > +    struct page_info *pg;
> > +
> > +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
> >          return;
> >  
> >      switch ( nt )
> > @@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> >          p2m->ioreq.entry_count++;
> >          break;
> >  
> > +    case p2m_map_foreign:
> > +        pg = mfn_to_page(nfn);
> > +
> > +        if ( !pg || !page_get_owner_and_reference(pg) )
> 
> mfn_to_page() can't return NULL, can it? You may want to ASSERT()
> beforehand that the MFN is not INVALID_MFN, though.

I've added ASSERT(mfn_valid(mfn)) and the same below. I think it's
safer to use mfn_valid rather that only checking against INVALID_MFM.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-02-07 16:53     ` Roger Pau Monné
@ 2019-02-07 16:59       ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-07 16:59 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Kevin Tian, Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan,
	Jun Nakajima, xen-devel

>>> On 07.02.19 at 17:53, <roger.pau@citrix.com> wrote:
> On Wed, Feb 06, 2019 at 09:59:30AM -0700, Jan Beulich wrote:
>> >>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>> > --- a/xen/include/asm-x86/p2m.h
>> > +++ b/xen/include/asm-x86/p2m.h
>> > @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>> >                                                unsigned int *flags);
>> >  
>> >  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
>> > -                                    p2m_type_t ot, unsigned int level)
>> > +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
>> > +                                    unsigned int level)
>> >  {
>> > -    if ( level != 1 || nt == ot )
>> > +    struct page_info *pg;
>> > +
>> > +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
>> >          return;
>> >  
>> >      switch ( nt )
>> > @@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
>> >          p2m->ioreq.entry_count++;
>> >          break;
>> >  
>> > +    case p2m_map_foreign:
>> > +        pg = mfn_to_page(nfn);
>> > +
>> > +        if ( !pg || !page_get_owner_and_reference(pg) )
>> 
>> mfn_to_page() can't return NULL, can it? You may want to ASSERT()
>> beforehand that the MFN is not INVALID_MFN, though.
> 
> I've added ASSERT(mfn_valid(mfn)) and the same below. I think it's
> safer to use mfn_valid rather that only checking against INVALID_MFM.

Thanks, indeed I too did consider this afterwards.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-01-30 10:36 ` [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value Roger Pau Monne
  2019-02-06 13:54   ` Jan Beulich
@ 2019-02-07 17:09   ` George Dunlap
  2019-02-07 17:48     ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: George Dunlap @ 2019-02-07 17:09 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Wed, Jan 30, 2019 at 10:37 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
>
> There have been several reports of the dom0 builder running out of
> memory when buildign a PVH dom0 without havingf specified a dom0_mem
> value. Print a warning message if dom0_mem is not set to a fixed value
> when booting in PVH mode.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Juergen Gross <jgross@suse.com>
> ---
> Without this patch creating a PVH dom0 without a dom0_mem parameter
> can result in the dom0 builder running out of memory thus leading to a
> Xen crash. The added message gives a hit to the user about a possible
> fix.

But why would it run out of memory?  Wouldn't it be better to find and
fix the root cause?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-05 13:38             ` Roger Pau Monné
  2019-02-05 14:55               ` Jan Beulich
@ 2019-02-07 17:21               ` George Dunlap
  2019-02-08 17:49                 ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: George Dunlap @ 2019-02-07 17:21 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Juergen Gross, xen-devel

On 2/5/19 1:38 PM, Roger Pau Monné wrote:
> On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
>>>>> On 05.02.19 at 11:40, <roger.pau@citrix.com> wrote:
>>> On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
>>>>>>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
>>>>> On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
>>>>>>>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
>>>>>>> The assert was originally added to make sure that higher order
>>>>>>> regions (> PAGE_ORDER_4K) could not be used to bypass the
>>>>>>> mmio_ro_ranges check performed by p2m_type_to_flags.
>>>>>>>
>>>>>>> This however is already checked in set_mmio_p2m_entry, which makes
>>>>>>> sure that higher order mappings don't overlap with mmio_ro_ranges,
>>>>>>> thus allowing the creation of high order MMIO mappings safely.
>>>>>>
>>>>>> Well, the assertions were added to make sure no other code
>>>>>> path appears that violates this requirement. Arguably e.g.
>>>>>> set_identity_p2m_entry() could gain an order parameter and
>>>>>> then try to establish larger p2m_mmio_direct entries.
>>>>>>
>>>>>> Don't get me wrong, I don't object to the removal of the
>>>>>> assertions, but the description makes it sound as if they were
>>>>>> entirely redundant. Even better would be though if they
>>>>>> could be extended to keep triggering in "bad" cases.
>>>>>
>>>>> I could add something like:
>>>>>
>>>>> ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>>>>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
>>>>>
>>>>> I think this should be safe and would trigger in case of misuse.
>>>>
>>>> Looks okay, if slightly extended (or made conditional) to exclude
>>>> the addition of MB(2) to MFN_INVALID to wrap and potentially
>>>> hit a r/o range in the low 1Mb.
>>>
>>> Ack, so it would be:
>>>
>>> ASSERT(mfn_eq(mfn, INVALID_MFN) ||
>>>        !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
>>
>> But that's still dropping the other aspect of the original ASSERT():
>>
>>>>>>> --- a/xen/arch/x86/mm/p2m-pt.c
>>>>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
>>>>>>> @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
>>>>>>>          }
>>>>>>>  
>>>>>>>          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
>>>>>>> -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
>>
>> It also made sure that "valid" MFNs can't be used for mappings with
>> p2m_mmio_direct type. Except that I realize now that this is wrong in
>> certain cases, because MMIO pages may actually have "valid" MFNs.
>> mfn_valid(), after all, only tells us whether there's a struct page_info
>> for the MFN. I wonder if it's really this brokenness that you hit,
>> rather than what is explained in the description.
>>
>> When the assertion was introduced, MMIO wasn't handled by the
>> code correctly anyway (!mfn_valid() MFNs would not have got any
>> mappings at all in the 2M and 1G paths), whereas now we have
>> p2m_allows_invalid_mfn() there. So the situation has become worse
>> with other nearby changes. As a result I think we want to correct
>> the assertion here alongside the addition of what you suggest
>> above. What about
>>
>>     if ( p2mt != p2m_mmio_direct )
>>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
>>                p2m_allows_invalid_mfn(p2mt)));
>>     else
>>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));

FWIW I agree with this approach (asserting !overlaps for p2m_mmio_direct
types).

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-02-07 17:09   ` George Dunlap
@ 2019-02-07 17:48     ` Roger Pau Monné
  2019-02-08 11:11       ` George Dunlap
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-07 17:48 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Thu, Feb 07, 2019 at 05:09:14PM +0000, George Dunlap wrote:
> On Wed, Jan 30, 2019 at 10:37 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> >
> > There have been several reports of the dom0 builder running out of
> > memory when buildign a PVH dom0 without havingf specified a dom0_mem
> > value. Print a warning message if dom0_mem is not set to a fixed value
> > when booting in PVH mode.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Jan Beulich <jbeulich@suse.com>
> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > Cc: Juergen Gross <jgross@suse.com>
> > ---
> > Without this patch creating a PVH dom0 without a dom0_mem parameter
> > can result in the dom0 builder running out of memory thus leading to a
> > Xen crash. The added message gives a hit to the user about a possible
> > fix.
> 
> But why would it run out of memory?  Wouldn't it be better to find and
> fix the root cause?

The root cause is that dom0_compute_nr_pages is not very good at
approximating the internal memory needed to build a PVH dom0, it only
takes into account the RAM regions of the p2m, but not the MMIO
regions, neither the fact that the p2m will likely contain holes.

I've had a patch series to fix this:

https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00437.html

But I'm afraid we didn't reach consensus with Jan and at this point in
the release adding a warning message seems like the less controversial
option while I figure out how to proceed in order to fix this. Such
message will at least give a hint to users about a possible fix.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-01-30 10:36 ` [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify Roger Pau Monne
  2019-02-06 16:59   ` Jan Beulich
@ 2019-02-07 17:49   ` George Dunlap
  2019-02-07 17:57     ` Roger Pau Monné
  1 sibling, 1 reply; 62+ messages in thread
From: George Dunlap @ 2019-02-07 17:49 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Jun Nakajima

On 1/30/19 10:36 AM, Roger Pau Monne wrote:
> So that the specific handling can be removed from
> atomic_write_ept_entry and be shared with npt and shadow code.
> 
> This commit also removes the check that prevent non-ept PVH dom0 from
> mapping foreign pages.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Jun Nakajima <jun.nakajima@intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Cc: Tim Deegan <tim@xen.org>
> ---
>  xen/arch/x86/mm/hap/hap.c       |   3 +-
>  xen/arch/x86/mm/p2m-ept.c       | 108 +++++++-------------------------
>  xen/arch/x86/mm/p2m-pt.c        |   7 ---
>  xen/arch/x86/mm/shadow/common.c |   3 +-
>  xen/include/asm-x86/p2m.h       |  30 ++++++++-
>  5 files changed, 53 insertions(+), 98 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
> index dc46d5e14f..4f52639be5 100644
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -735,7 +735,8 @@ hap_write_p2m_entry(struct domain *d, unsigned long gfn, l1_pgentry_t *p,
>      }
>  
>      p2m_entry_modify(p2m_get_hostp2m(d), p2m_flags_to_type(l1e_get_flags(new)),
> -                     p2m_flags_to_type(old_flags), level);
> +                     p2m_flags_to_type(old_flags), l1e_get_mfn(new),
> +                     l1e_get_mfn(*p), level);
>  
>      safe_write_pte(p, new);
>      if ( old_flags & _PAGE_PRESENT )
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index 0ece6608cb..2b0c3ab265 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -45,65 +45,13 @@ static inline bool_t is_epte_valid(ept_entry_t *e)
>      return ((e->epte & ~(1ul << 63)) != 0 && e->sa_p2mt != p2m_invalid);
>  }
>  
> -/* returns : 0 for success, -errno otherwise */
> -static int atomic_write_ept_entry(struct p2m_domain *p2m,
> -                                  ept_entry_t *entryptr, ept_entry_t new,
> -                                  int level)
> +static void atomic_write_ept_entry(struct p2m_domain *p2m,
> +                                   ept_entry_t *entryptr, ept_entry_t new,
> +                                   int level)
>  {
> -    int rc;
> -    unsigned long oldmfn = mfn_x(INVALID_MFN);
> -    bool_t check_foreign = (new.mfn != entryptr->mfn ||
> -                            new.sa_p2mt != entryptr->sa_p2mt);
> -
> -    if ( level )
> -    {
> -        ASSERT(!is_epte_superpage(&new) || !p2m_is_foreign(new.sa_p2mt));
> -        write_atomic(&entryptr->epte, new.epte);
> -        return 0;
> -    }
> -
> -    if ( unlikely(p2m_is_foreign(new.sa_p2mt)) )
> -    {
> -        rc = -EINVAL;
> -        if ( !is_epte_present(&new) )
> -                goto out;
> -
> -        if ( check_foreign )
> -        {
> -            struct domain *fdom;
> -
> -            if ( !mfn_valid(_mfn(new.mfn)) )
> -                goto out;
> -
> -            rc = -ESRCH;
> -            fdom = page_get_owner(mfn_to_page(_mfn(new.mfn)));
> -            if ( fdom == NULL )
> -                goto out;
> -
> -            /* get refcount on the page */
> -            rc = -EBUSY;
> -            if ( !get_page(mfn_to_page(_mfn(new.mfn)), fdom) )
> -                goto out;
> -        }
> -    }
> -
> -    if ( unlikely(p2m_is_foreign(entryptr->sa_p2mt)) && check_foreign )
> -        oldmfn = entryptr->mfn;
> -
> -    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, level);
> -
> +    p2m_entry_modify(p2m, new.sa_p2mt, entryptr->sa_p2mt, _mfn(new.mfn),
> +                     _mfn(entryptr->mfn), level);
>      write_atomic(&entryptr->epte, new.epte);
> -
> -    if ( unlikely(oldmfn != mfn_x(INVALID_MFN)) )
> -        put_page(mfn_to_page(_mfn(oldmfn)));
> -
> -    rc = 0;
> -
> - out:
> -    if ( rc )
> -        gdprintk(XENLOG_ERR, "epte o:%"PRIx64" n:%"PRIx64" rc:%d\n",
> -                 entryptr->epte, new.epte, rc);
> -    return rc;
>  }

This is pretty awesome. :-)

> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> index 834d49d2d4..1cc8acb3fe 100644
> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>                                                unsigned int *flags);
>  
>  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> -                                    p2m_type_t ot, unsigned int level)
> +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
> +                                    unsigned int level)
>  {
> -    if ( level != 1 || nt == ot )
> +    struct page_info *pg;
> +
> +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
>          return;

Are you sure that foreign mappings (or ioreq server pages, for that
matter) can never be level > 1?

ioreq server pages may be relatively harmless if we get out of sync; but
the reference count with the foreign mapping is really dangerous if it
gets screwed up.

I'd be tempted to say that we should BUG_ON(level > 1 && nt == foreign).

>  
>      switch ( nt )
> @@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
>          p2m->ioreq.entry_count++;
>          break;
>  
> +    case p2m_map_foreign:
> +        pg = mfn_to_page(nfn);
> +
> +        if ( !pg || !page_get_owner_and_reference(pg) )
> +        {
> +            ASSERT_UNREACHABLE();
> +            return;
> +        }

Similarly, I'd be tempted to say we should BUG_ON() here instead.  If a
rogue guest can trigger this path, it would be a DoS; but the alternate
would be to allow the mfn to be put into the p2m table without a
reference, which could potentially be far worse.

The alternate would be to have this return an error value, which would
1) cause the p2m write to fail, and 2) be checked all the way up the chain.

Less worried about the removal side, as  if we have BUG_ON's on the
insertion side, they *really* shouldn't happen.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-02-07 17:49   ` George Dunlap
@ 2019-02-07 17:57     ` Roger Pau Monné
  2019-02-07 18:05       ` George Dunlap
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-07 17:57 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Jun Nakajima, xen-devel

On Thu, Feb 07, 2019 at 05:49:16PM +0000, George Dunlap wrote:
> On 1/30/19 10:36 AM, Roger Pau Monne wrote:
> > diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> > index 834d49d2d4..1cc8acb3fe 100644
> > --- a/xen/include/asm-x86/p2m.h
> > +++ b/xen/include/asm-x86/p2m.h
> > @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >                                                unsigned int *flags);
> >  
> >  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> > -                                    p2m_type_t ot, unsigned int level)
> > +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
> > +                                    unsigned int level)
> >  {
> > -    if ( level != 1 || nt == ot )
> > +    struct page_info *pg;
> > +
> > +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
> >          return;
> 
> Are you sure that foreign mappings (or ioreq server pages, for that
> matter) can never be level > 1?

Not given the current Xen interface, see
XENMEM_add_to_physmap{_batch}. This will have to change if the
interface is expanded to allow 2M or 1G mappings.

> ioreq server pages may be relatively harmless if we get out of sync; but
> the reference count with the foreign mapping is really dangerous if it
> gets screwed up.
> 
> I'd be tempted to say that we should BUG_ON(level > 1 && nt == foreign).

Yes, I think that's a sensible safety belt.

> 
> >  
> >      switch ( nt )
> > @@ -948,6 +951,17 @@ static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> >          p2m->ioreq.entry_count++;
> >          break;
> >  
> > +    case p2m_map_foreign:
> > +        pg = mfn_to_page(nfn);
> > +
> > +        if ( !pg || !page_get_owner_and_reference(pg) )
> > +        {
> > +            ASSERT_UNREACHABLE();
> > +            return;
> > +        }
> 
> Similarly, I'd be tempted to say we should BUG_ON() here instead.  If a
> rogue guest can trigger this path, it would be a DoS; but the alternate
> would be to allow the mfn to be put into the p2m table without a
> reference, which could potentially be far worse.
> 
> The alternate would be to have this return an error value, which would
> 1) cause the p2m write to fail, and 2) be checked all the way up the chain.
> 
> Less worried about the removal side, as  if we have BUG_ON's on the
> insertion side, they *really* shouldn't happen.

I would go for the BUG_ON ATM, because it's a simpler solution and
callers of p2m_entry_modify should make sure the operation is correct.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-02-07 17:57     ` Roger Pau Monné
@ 2019-02-07 18:05       ` George Dunlap
  2019-02-08  9:37         ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: George Dunlap @ 2019-02-07 18:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Jun Nakajima, xen-devel

On 2/7/19 5:57 PM, Roger Pau Monné wrote:
> On Thu, Feb 07, 2019 at 05:49:16PM +0000, George Dunlap wrote:
>> On 1/30/19 10:36 AM, Roger Pau Monne wrote:
>>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>>> index 834d49d2d4..1cc8acb3fe 100644
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
>>>                                                unsigned int *flags);
>>>  
>>>  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
>>> -                                    p2m_type_t ot, unsigned int level)
>>> +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
>>> +                                    unsigned int level)
>>>  {
>>> -    if ( level != 1 || nt == ot )
>>> +    struct page_info *pg;
>>> +
>>> +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
>>>          return;
>>
>> Are you sure that foreign mappings (or ioreq server pages, for that
>> matter) can never be level > 1?
> 
> Not given the current Xen interface, see
> XENMEM_add_to_physmap{_batch}. This will have to change if the
> interface is expanded to allow 2M or 1G mappings.

Right; but the question really meant to say: If such an interface
expansion happened, are you sure the person who did it would remember to
handle larger pages down here?

I think we need at least an ASSERT() for ioreq_server types as well
(although we might want to take a look to see what badness could happen
if the count was out of sync -- adding that condition to the BUG_ON()
might be necessary as well).

>> The alternate would be to have this return an error value, which would
>> 1) cause the p2m write to fail, and 2) be checked all the way up the chain.
>>
>> Less worried about the removal side, as  if we have BUG_ON's on the
>> insertion side, they *really* shouldn't happen.
> 
> I would go for the BUG_ON ATM, because it's a simpler solution and
> callers of p2m_entry_modify should make sure the operation is correct.

Ack.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify
  2019-02-07 18:05       ` George Dunlap
@ 2019-02-08  9:37         ` Roger Pau Monné
  0 siblings, 0 replies; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-08  9:37 UTC (permalink / raw)
  To: George Dunlap
  Cc: Kevin Tian, Wei Liu, Jan Beulich, George Dunlap, Andrew Cooper,
	Tim Deegan, Jun Nakajima, xen-devel

On Thu, Feb 07, 2019 at 06:05:24PM +0000, George Dunlap wrote:
> On 2/7/19 5:57 PM, Roger Pau Monné wrote:
> > On Thu, Feb 07, 2019 at 05:49:16PM +0000, George Dunlap wrote:
> >> On 1/30/19 10:36 AM, Roger Pau Monne wrote:
> >>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
> >>> index 834d49d2d4..1cc8acb3fe 100644
> >>> --- a/xen/include/asm-x86/p2m.h
> >>> +++ b/xen/include/asm-x86/p2m.h
> >>> @@ -933,9 +933,12 @@ struct hvm_ioreq_server *p2m_get_ioreq_server(struct domain *d,
> >>>                                                unsigned int *flags);
> >>>  
> >>>  static inline void p2m_entry_modify(struct p2m_domain *p2m, p2m_type_t nt,
> >>> -                                    p2m_type_t ot, unsigned int level)
> >>> +                                    p2m_type_t ot, mfn_t nfn, mfn_t ofn,
> >>> +                                    unsigned int level)
> >>>  {
> >>> -    if ( level != 1 || nt == ot )
> >>> +    struct page_info *pg;
> >>> +
> >>> +    if ( level != 1 || (nt == ot && mfn_eq(nfn, ofn)) )
> >>>          return;
> >>
> >> Are you sure that foreign mappings (or ioreq server pages, for that
> >> matter) can never be level > 1?
> > 
> > Not given the current Xen interface, see
> > XENMEM_add_to_physmap{_batch}. This will have to change if the
> > interface is expanded to allow 2M or 1G mappings.
> 
> Right; but the question really meant to say: If such an interface
> expansion happened, are you sure the person who did it would remember to
> handle larger pages down here?

I've added:

BUG_ON(level > 1 && (nt == p2m_ioreq_server || nt == p2m_map_foreign));

At the top of the function. This will catch any attempt to add any of
such mappings with an order different than 4K.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value
  2019-02-07 17:48     ` Roger Pau Monné
@ 2019-02-08 11:11       ` George Dunlap
  0 siblings, 0 replies; 62+ messages in thread
From: George Dunlap @ 2019-02-08 11:11 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Juergen Gross, xen-devel, Wei Liu, Jan Beulich, Andrew Cooper

On Thu, Feb 7, 2019 at 5:48 PM Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> On Thu, Feb 07, 2019 at 05:09:14PM +0000, George Dunlap wrote:
> > On Wed, Jan 30, 2019 at 10:37 AM Roger Pau Monne <roger.pau@citrix.com> wrote:
> > >
> > > There have been several reports of the dom0 builder running out of
> > > memory when buildign a PVH dom0 without havingf specified a dom0_mem
> > > value. Print a warning message if dom0_mem is not set to a fixed value
> > > when booting in PVH mode.
> > >
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > Cc: Jan Beulich <jbeulich@suse.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > Cc: Juergen Gross <jgross@suse.com>
> > > ---
> > > Without this patch creating a PVH dom0 without a dom0_mem parameter
> > > can result in the dom0 builder running out of memory thus leading to a
> > > Xen crash. The added message gives a hit to the user about a possible
> > > fix.
> >
> > But why would it run out of memory?  Wouldn't it be better to find and
> > fix the root cause?
>
> The root cause is that dom0_compute_nr_pages is not very good at
> approximating the internal memory needed to build a PVH dom0, it only
> takes into account the RAM regions of the p2m, but not the MMIO
> regions, neither the fact that the p2m will likely contain holes.
>
> I've had a patch series to fix this:
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00437.html
>
> But I'm afraid we didn't reach consensus with Jan and at this point in
> the release adding a warning message seems like the less controversial
> option while I figure out how to proceed in order to fix this. Such
> message will at least give a hint to users about a possible fix.

OK; then both the changelog and the code (via a comment) should
clearly indicate that this is a temporary work-around; and ideally the
log message too.

What about something like the following? "WARNING: PVH dom0 without
dom0_mem set is still unstable.  If you get crashes during boot, try
adding a dom0_mem parameter."

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-07 17:21               ` George Dunlap
@ 2019-02-08 17:49                 ` Roger Pau Monné
  2019-02-11  9:47                   ` Jan Beulich
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-08 17:49 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	Jan Beulich, xen-devel

On Thu, Feb 07, 2019 at 05:21:28PM +0000, George Dunlap wrote:
> On 2/5/19 1:38 PM, Roger Pau Monné wrote:
> > On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
> >>>>> On 05.02.19 at 11:40, <roger.pau@citrix.com> wrote:
> >>> On Tue, Feb 05, 2019 at 12:45:56AM -0700, Jan Beulich wrote:
> >>>>>>> On 04.02.19 at 18:18, <roger.pau@citrix.com> wrote:
> >>>>> On Mon, Feb 04, 2019 at 09:56:22AM -0700, Jan Beulich wrote:
> >>>>>>>>> On 30.01.19 at 11:36, <roger.pau@citrix.com> wrote:
> >>>>>>> The assert was originally added to make sure that higher order
> >>>>>>> regions (> PAGE_ORDER_4K) could not be used to bypass the
> >>>>>>> mmio_ro_ranges check performed by p2m_type_to_flags.
> >>>>>>>
> >>>>>>> This however is already checked in set_mmio_p2m_entry, which makes
> >>>>>>> sure that higher order mappings don't overlap with mmio_ro_ranges,
> >>>>>>> thus allowing the creation of high order MMIO mappings safely.
> >>>>>>
> >>>>>> Well, the assertions were added to make sure no other code
> >>>>>> path appears that violates this requirement. Arguably e.g.
> >>>>>> set_identity_p2m_entry() could gain an order parameter and
> >>>>>> then try to establish larger p2m_mmio_direct entries.
> >>>>>>
> >>>>>> Don't get me wrong, I don't object to the removal of the
> >>>>>> assertions, but the description makes it sound as if they were
> >>>>>> entirely redundant. Even better would be though if they
> >>>>>> could be extended to keep triggering in "bad" cases.
> >>>>>
> >>>>> I could add something like:
> >>>>>
> >>>>> ASSERT(!rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >>>>>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> >>>>>
> >>>>> I think this should be safe and would trigger in case of misuse.
> >>>>
> >>>> Looks okay, if slightly extended (or made conditional) to exclude
> >>>> the addition of MB(2) to MFN_INVALID to wrap and potentially
> >>>> hit a r/o range in the low 1Mb.
> >>>
> >>> Ack, so it would be:
> >>>
> >>> ASSERT(mfn_eq(mfn, INVALID_MFN) ||
> >>>        !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >>>                                 mfn_x(mfn) + PFN_DOWN(MB(2))));
> >>
> >> But that's still dropping the other aspect of the original ASSERT():
> >>
> >>>>>>> --- a/xen/arch/x86/mm/p2m-pt.c
> >>>>>>> +++ b/xen/arch/x86/mm/p2m-pt.c
> >>>>>>> @@ -668,7 +668,6 @@ p2m_pt_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
> >>>>>>>          }
> >>>>>>>  
> >>>>>>>          ASSERT(p2m_flags_to_type(flags) != p2m_ioreq_server);
> >>>>>>> -        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
> >>
> >> It also made sure that "valid" MFNs can't be used for mappings with
> >> p2m_mmio_direct type. Except that I realize now that this is wrong in
> >> certain cases, because MMIO pages may actually have "valid" MFNs.
> >> mfn_valid(), after all, only tells us whether there's a struct page_info
> >> for the MFN. I wonder if it's really this brokenness that you hit,
> >> rather than what is explained in the description.
> >>
> >> When the assertion was introduced, MMIO wasn't handled by the
> >> code correctly anyway (!mfn_valid() MFNs would not have got any
> >> mappings at all in the 2M and 1G paths), whereas now we have
> >> p2m_allows_invalid_mfn() there. So the situation has become worse
> >> with other nearby changes. As a result I think we want to correct
> >> the assertion here alongside the addition of what you suggest
> >> above. What about
> >>
> >>     if ( p2mt != p2m_mmio_direct )
> >>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
> >>                p2m_allows_invalid_mfn(p2mt)));
> >>     else
> >>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
> >>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));
> 
> FWIW I agree with this approach (asserting !overlaps for p2m_mmio_direct
> types).

Seeing the report from Sandler:

https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00578.html

Looks like the assert on the if branch in the above example is not
correct, the code allows for invalid mfns even if
p2m_allows_invalid_mfn return false by using l2e_empty.

I think the correct asserts would be:

if ( p2mt == p2m_mmio_direct )
    ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
           !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
                                    mfn_x(mfn) + PFN_DOWN(MB(2))));
else
    ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-08 17:49                 ` Roger Pau Monné
@ 2019-02-11  9:47                   ` Jan Beulich
  2019-02-11 12:03                     ` Roger Pau Monné
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Beulich @ 2019-02-11  9:47 UTC (permalink / raw)
  To: george.dunlap, Roger Pau Monne
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Juergen Gross, xen-devel

>>> On 08.02.19 at 18:49, <roger.pau@citrix.com> wrote:
> On Thu, Feb 07, 2019 at 05:21:28PM +0000, George Dunlap wrote:
>> On 2/5/19 1:38 PM, Roger Pau Monné wrote:
>> > On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
>> >> When the assertion was introduced, MMIO wasn't handled by the
>> >> code correctly anyway (!mfn_valid() MFNs would not have got any
>> >> mappings at all in the 2M and 1G paths), whereas now we have
>> >> p2m_allows_invalid_mfn() there. So the situation has become worse
>> >> with other nearby changes. As a result I think we want to correct
>> >> the assertion here alongside the addition of what you suggest
>> >> above. What about
>> >>
>> >>     if ( p2mt != p2m_mmio_direct )
>> >>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
>> >>                p2m_allows_invalid_mfn(p2mt)));
>> >>     else
>> >>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>> >>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));
>> 
>> FWIW I agree with this approach (asserting !overlaps for p2m_mmio_direct
>> types).
> 
> Seeing the report from Sandler:
> 
> https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00578.html 
> 
> Looks like the assert on the if branch in the above example is not
> correct, the code allows for invalid mfns even if
> p2m_allows_invalid_mfn return false by using l2e_empty.
> 
> I think the correct asserts would be:
> 
> if ( p2mt == p2m_mmio_direct )
>     ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>            !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>                                     mfn_x(mfn) + PFN_DOWN(MB(2))));
> else
>     ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));

Hmm, perhaps this is good enough as an assertion, but I'd like
the fixed one to at least be considered:

if ( p2mt == p2m_mmio_direct )
    ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
           !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
                                    mfn_x(mfn) + PFN_DOWN(MB(2))));
else if ( p2m_allows_invalid_mfn(p2mt) || p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
    ASSERT(mfn_eq(mfn, INVALID_MFN));
else
    ASSERT(mfn_valid(mfn));

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
  2019-02-11  9:47                   ` Jan Beulich
@ 2019-02-11 12:03                     ` Roger Pau Monné
       [not found]                       ` <7BBE0D330200008C0063616D@prv1-mh.provo.novell.com>
  0 siblings, 1 reply; 62+ messages in thread
From: Roger Pau Monné @ 2019-02-11 12:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

On Mon, Feb 11, 2019 at 02:47:48AM -0700, Jan Beulich wrote:
> >>> On 08.02.19 at 18:49, <roger.pau@citrix.com> wrote:
> > On Thu, Feb 07, 2019 at 05:21:28PM +0000, George Dunlap wrote:
> >> On 2/5/19 1:38 PM, Roger Pau Monné wrote:
> >> > On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
> >> >> When the assertion was introduced, MMIO wasn't handled by the
> >> >> code correctly anyway (!mfn_valid() MFNs would not have got any
> >> >> mappings at all in the 2M and 1G paths), whereas now we have
> >> >> p2m_allows_invalid_mfn() there. So the situation has become worse
> >> >> with other nearby changes. As a result I think we want to correct
> >> >> the assertion here alongside the addition of what you suggest
> >> >> above. What about
> >> >>
> >> >>     if ( p2mt != p2m_mmio_direct )
> >> >>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
> >> >>                p2m_allows_invalid_mfn(p2mt)));
> >> >>     else
> >> >>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
> >> >>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >> >>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));
> >> 
> >> FWIW I agree with this approach (asserting !overlaps for p2m_mmio_direct
> >> types).
> > 
> > Seeing the report from Sandler:
> > 
> > https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00578.html 
> > 
> > Looks like the assert on the if branch in the above example is not
> > correct, the code allows for invalid mfns even if
> > p2m_allows_invalid_mfn return false by using l2e_empty.
> > 
> > I think the correct asserts would be:
> > 
> > if ( p2mt == p2m_mmio_direct )
> >     ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
> >            !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> >                                     mfn_x(mfn) + PFN_DOWN(MB(2))));
> > else
> >     ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));
> 
> Hmm, perhaps this is good enough as an assertion, but I'd like
> the fixed one to at least be considered:
> 
> if ( p2mt == p2m_mmio_direct )
>     ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>            !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>                                     mfn_x(mfn) + PFN_DOWN(MB(2))));
> else if ( p2m_allows_invalid_mfn(p2mt) || p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
>     ASSERT(mfn_eq(mfn, INVALID_MFN));

I'm not sure this is correct, the fact that certain types (POD or
paged memory) allow for invalid mfns doesn't mean that the mfn must
always be invalid, like the above condition and assert assumes?

Ie: I think the assert should be adjusted to:

ASSERT(mfn_eq(mfn, INVALID_MFN) || mfn_valid(mfn));

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries
       [not found]                       ` <7BBE0D330200008C0063616D@prv1-mh.provo.novell.com>
@ 2019-02-11 12:11                         ` Jan Beulich
  0 siblings, 0 replies; 62+ messages in thread
From: Jan Beulich @ 2019-02-11 12:11 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Juergen Gross, Wei Liu, George Dunlap, Andrew Cooper,
	george.dunlap, xen-devel

>>> On 11.02.19 at 13:03, <roger.pau@citrix.com> wrote:
> On Mon, Feb 11, 2019 at 02:47:48AM -0700, Jan Beulich wrote:
>> >>> On 08.02.19 at 18:49, <roger.pau@citrix.com> wrote:
>> > On Thu, Feb 07, 2019 at 05:21:28PM +0000, George Dunlap wrote:
>> >> On 2/5/19 1:38 PM, Roger Pau Monné wrote:
>> >> > On Tue, Feb 05, 2019 at 05:44:14AM -0700, Jan Beulich wrote:
>> >> >> When the assertion was introduced, MMIO wasn't handled by the
>> >> >> code correctly anyway (!mfn_valid() MFNs would not have got any
>> >> >> mappings at all in the 2M and 1G paths), whereas now we have
>> >> >> p2m_allows_invalid_mfn() there. So the situation has become worse
>> >> >> with other nearby changes. As a result I think we want to correct
>> >> >> the assertion here alongside the addition of what you suggest
>> >> >> above. What about
>> >> >>
>> >> >>     if ( p2mt != p2m_mmio_direct )
>> >> >>         ASSERT(mfn_valid(mfn) || (mfn_eq(mfn, INVALID_MFN) &&
>> >> >>                p2m_allows_invalid_mfn(p2mt)));
>> >> >>     else
>> >> >>         ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>> >> >>                !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >> >>                                         mfn_x(mfn) + PFN_DOWN(MB(2))));
>> >> 
>> >> FWIW I agree with this approach (asserting !overlaps for p2m_mmio_direct
>> >> types).
>> > 
>> > Seeing the report from Sandler:
>> > 
>> > https://lists.xenproject.org/archives/html/xen-devel/2019-02/msg00578.html 
>> > 
>> > Looks like the assert on the if branch in the above example is not
>> > correct, the code allows for invalid mfns even if
>> > p2m_allows_invalid_mfn return false by using l2e_empty.
>> > 
>> > I think the correct asserts would be:
>> > 
>> > if ( p2mt == p2m_mmio_direct )
>> >     ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>> >            !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> >                                     mfn_x(mfn) + PFN_DOWN(MB(2))));
>> > else
>> >     ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));
>> 
>> Hmm, perhaps this is good enough as an assertion, but I'd like
>> the fixed one to at least be considered:
>> 
>> if ( p2mt == p2m_mmio_direct )
>>     ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
>>            !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>                                     mfn_x(mfn) + PFN_DOWN(MB(2))));
>> else if ( p2m_allows_invalid_mfn(p2mt) || p2mt == p2m_invalid || p2mt == 
> p2m_mmio_dm )
>>     ASSERT(mfn_eq(mfn, INVALID_MFN));
> 
> I'm not sure this is correct, the fact that certain types (POD or
> paged memory) allow for invalid mfns doesn't mean that the mfn must
> always be invalid, like the above condition and assert assumes?

Oh, yes, I think you're right (for some of the paging types at
least; PoD should not be using other than INVALID_MFN anymore).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2019-02-11 12:11 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30 10:36 [PATCH for-4.12 0/8] pvh/dom0/shadow/amd fixes Roger Pau Monne
2019-01-30 10:36 ` [PATCH for-4.12 1/8] dom0/pvh: align allocation and mapping order to start address Roger Pau Monne
2019-01-30 12:37   ` Wei Liu
2019-01-30 13:58     ` Roger Pau Monné
2019-01-31 17:22       ` Wei Liu
2019-02-04 16:41   ` Jan Beulich
2019-02-04 17:11     ` Roger Pau Monné
2019-02-05  7:42       ` Jan Beulich
2019-02-05 10:26         ` Roger Pau Monné
2019-02-05 11:38           ` Jan Beulich
2019-02-05 10:54         ` Andrew Cooper
2019-02-05 11:37           ` Jan Beulich
2019-01-30 10:36 ` [PATCH for-4.12 2/8] amd/ntp: remove assert that prevents creating 2M MMIO entries Roger Pau Monne
2019-02-04 16:48   ` Andrew Cooper
2019-02-04 16:56   ` Jan Beulich
2019-02-04 17:18     ` Roger Pau Monné
2019-02-05  7:45       ` Jan Beulich
2019-02-05 10:40         ` Roger Pau Monné
2019-02-05 12:44           ` Jan Beulich
2019-02-05 13:38             ` Roger Pau Monné
2019-02-05 14:55               ` Jan Beulich
2019-02-07 17:21               ` George Dunlap
2019-02-08 17:49                 ` Roger Pau Monné
2019-02-11  9:47                   ` Jan Beulich
2019-02-11 12:03                     ` Roger Pau Monné
     [not found]                       ` <7BBE0D330200008C0063616D@prv1-mh.provo.novell.com>
2019-02-11 12:11                         ` Jan Beulich
2019-01-30 10:36 ` [PATCH for-4.12 3/8] iommu/pvh: add reserved regions below 1MB to the iommu page tables Roger Pau Monne
2019-02-05 10:47   ` Jan Beulich
2019-02-05 11:15     ` Roger Pau Monné
2019-02-05 12:49       ` Jan Beulich
2019-02-05 14:01         ` Roger Pau Monné
2019-02-05 15:18           ` Jan Beulich
2019-02-05 15:45             ` Roger Pau Monné
2019-01-30 10:36 ` [PATCH for-4.12 4/8] x86/shadow: alloc enough pages so initialization doesn't fail Roger Pau Monne
2019-02-05 11:21   ` Jan Beulich
2019-02-05 11:47     ` Roger Pau Monné
2019-02-05 12:55       ` Jan Beulich
2019-02-05 13:52         ` Roger Pau Monné
2019-02-05 15:15           ` Jan Beulich
2019-02-05 15:53             ` Roger Pau Monné
2019-02-05 17:32               ` Jan Beulich
2019-02-06  9:10                 ` Roger Pau Monné
2019-02-06 11:02                   ` Jan Beulich
2019-01-30 10:36 ` [PATCH for-4.12 5/8] pvh/dom0: warn when dom0_mem is not set to a fixed value Roger Pau Monne
2019-02-06 13:54   ` Jan Beulich
2019-02-07 15:39     ` Roger Pau Monné
2019-02-07 16:47       ` Jan Beulich
2019-02-07 17:09   ` George Dunlap
2019-02-07 17:48     ` Roger Pau Monné
2019-02-08 11:11       ` George Dunlap
2019-01-30 10:36 ` [PATCH 6/8] x86/mm: split p2m ioreq server pages special handling into helper Roger Pau Monne
2019-01-31 14:59   ` Paul Durrant
2019-01-31 16:58     ` Roger Pau Monné
2019-01-30 10:36 ` [PATCH 7/8] x86/mm: handle foreign mappings in p2m_entry_modify Roger Pau Monne
2019-02-06 16:59   ` Jan Beulich
2019-02-07 16:53     ` Roger Pau Monné
2019-02-07 16:59       ` Jan Beulich
2019-02-07 17:49   ` George Dunlap
2019-02-07 17:57     ` Roger Pau Monné
2019-02-07 18:05       ` George Dunlap
2019-02-08  9:37         ` Roger Pau Monné
2019-01-30 10:36 ` [PATCH 8/8] npt/shadow: allow getting foreign page table entries Roger Pau Monne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.