xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/15] switch to domheap for Xen page tables
@ 2020-05-29 11:11 Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
                   ` (14 more replies)
  0 siblings, 15 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, julien, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

This series rewrites all the remaining functions and finally makes the
switch from xenheap to domheap for Xen page tables, so that they no
longer need to rely on the direct map, which is a big step towards
removing the direct map.

This series depends on the following mini-series:
https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00730.html

---
Changed in v7:
- rebase and cleanup.
- address comments in v6.
- add alloc_map_clear_xen_pt() helper to simplify the patches in this
  series.

Changed in v6:
- drop the patches that have already been merged.
- rebase and cleanup.
- rewrite map_pages_to_xen() and modify_xen_mappings() in a way that
  does not require an end_of_loop goto label.

Hongyan Xia (2):
  x86/mm: drop old page table APIs
  x86: switch to use domheap page for page tables

Wei Liu (13):
  x86/mm: map_pages_to_xen would better have one exit path
  x86/mm: make sure there is one exit path for modify_xen_mappings
  x86/mm: rewrite virt_to_xen_l*e
  x86/mm: switch to new APIs in map_pages_to_xen
  x86/mm: switch to new APIs in modify_xen_mappings
  x86_64/mm: introduce pl2e in paging_init
  x86_64/mm: switch to new APIs in paging_init
  x86_64/mm: switch to new APIs in setup_m2p_table
  efi: use new page table APIs in copy_mapping
  efi: switch to new APIs in EFI code
  x86/smpboot: add exit path for clone_mapping()
  x86/smpboot: switch clone_mapping() to new APIs
  x86/mm: drop _new suffix for page table APIs

 xen/arch/x86/domain_page.c |  11 +-
 xen/arch/x86/efi/runtime.h |  13 +-
 xen/arch/x86/mm.c          | 273 +++++++++++++++++++++++--------------
 xen/arch/x86/setup.c       |   4 +-
 xen/arch/x86/smpboot.c     |  70 ++++++----
 xen/arch/x86/x86_64/mm.c   |  82 ++++++-----
 xen/common/efi/boot.c      |  87 +++++++-----
 xen/common/efi/efi.h       |   3 +-
 xen/common/efi/runtime.c   |   8 +-
 xen/common/vmap.c          |   1 +
 xen/include/asm-x86/mm.h   |   7 +-
 xen/include/asm-x86/page.h |  13 +-
 12 files changed, 354 insertions(+), 218 deletions(-)

-- 
2.24.1.AMZN



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 01/15] x86/mm: map_pages_to_xen would better have one exit path
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon rewrite the function to handle dynamically mapping and
unmapping of page tables. Since dynamic mappings may map and unmap pages
in different iterations of the while loop, we need to lift pl3e out of
the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since rc never gets changed to anything else.
- reword commit message.
---
 xen/arch/x86/mm.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 54980b4eb1..d99f9bc133 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5068,9 +5068,11 @@ int map_pages_to_xen(
     unsigned int flags)
 {
     bool locking = system_state > SYS_STATE_boot;
+    l3_pgentry_t *pl3e, ol3e;
     l2_pgentry_t *pl2e, ol2e;
     l1_pgentry_t *pl1e, ol1e;
     unsigned int  i;
+    int rc = -ENOMEM;
 
 #define flush_flags(oldf) do {                 \
     unsigned int o_ = (oldf);                  \
@@ -5088,10 +5090,11 @@ int map_pages_to_xen(
 
     while ( nr_mfns != 0 )
     {
-        l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
+        pl3e = virt_to_xen_l3e(virt);
 
         if ( !pl3e )
-            return -ENOMEM;
+            goto out;
+
         ol3e = *pl3e;
 
         if ( cpu_has_page1gb &&
@@ -5181,7 +5184,7 @@ int map_pages_to_xen(
 
             l2t = alloc_xen_pagetable();
             if ( l2t == NULL )
-                return -ENOMEM;
+                goto out;
 
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
@@ -5210,7 +5213,7 @@ int map_pages_to_xen(
 
         pl2e = virt_to_xen_l2e(virt);
         if ( !pl2e )
-            return -ENOMEM;
+            goto out;
 
         if ( ((((virt >> PAGE_SHIFT) | mfn_x(mfn)) &
                ((1u << PAGETABLE_ORDER) - 1)) == 0) &&
@@ -5254,7 +5257,7 @@ int map_pages_to_xen(
             {
                 pl1e = virt_to_xen_l1e(virt);
                 if ( pl1e == NULL )
-                    return -ENOMEM;
+                    goto out;
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
@@ -5282,7 +5285,7 @@ int map_pages_to_xen(
 
                 l1t = alloc_xen_pagetable();
                 if ( l1t == NULL )
-                    return -ENOMEM;
+                    goto out;
 
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
@@ -5428,7 +5431,10 @@ int map_pages_to_xen(
 
 #undef flush_flags
 
-    return 0;
+    rc = 0;
+
+ out:
+    return rc;
 }
 
 int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon need to handle dynamically mapping / unmapping page
tables in the said function. Since dynamic mappings may map and unmap
pl3e in different iterations, lift pl3e out of the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since it never gets changed to anything else.
---
 xen/arch/x86/mm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d99f9bc133..462682ba70 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5457,10 +5457,12 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
+    l3_pgentry_t *pl3e;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
     unsigned int  i;
     unsigned long v = s;
+    int rc = -ENOMEM;
 
     /* Set of valid PTE bits which may be altered. */
 #define FLAGS_MASK (_PAGE_NX|_PAGE_RW|_PAGE_PRESENT)
@@ -5471,7 +5473,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
     while ( v < e )
     {
-        l3_pgentry_t *pl3e = virt_to_xen_l3e(v);
+        pl3e = virt_to_xen_l3e(v);
 
         if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
@@ -5504,7 +5506,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             /* PAGE1GB: shatter the superpage and fall through. */
             l2t = alloc_xen_pagetable();
             if ( !l2t )
-                return -ENOMEM;
+                goto out;
+
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
@@ -5561,7 +5564,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 /* PSE: shatter the superpage and try again. */
                 l1t = alloc_xen_pagetable();
                 if ( !l1t )
-                    return -ENOMEM;
+                    goto out;
+
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
@@ -5694,7 +5698,10 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     flush_area(NULL, FLUSH_TLB_GLOBAL);
 
 #undef FLAGS_MASK
-    return 0;
+    rc = 0;
+
+ out:
+    return rc;
 }
 
 #undef flush_area
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 10:47   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, julien, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Rewrite those functions to use the new APIs. Modify its callers to unmap
the pointer returned. Since alloc_xen_pagetable_new() is almost never
useful unless accompanied by page clearing and a mapping, introduce a
helper alloc_map_clear_xen_pt() for this sequence.

Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
unmap the page, which requires domain_page.h header in vmap.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- remove a comment.
- use l1e_get_mfn() instead of converting things back and forth.
- add alloc_map_clear_xen_pt().
- unmap before the next mapping to reduce mapcache pressure.
- use normal unmap calls instead of the macro in error paths because
  unmap can handle NULL now.
---
 xen/arch/x86/domain_page.c | 11 +++--
 xen/arch/x86/mm.c          | 96 +++++++++++++++++++++++++++-----------
 xen/common/vmap.c          |  1 +
 xen/include/asm-x86/mm.h   |  1 +
 xen/include/asm-x86/page.h |  8 +++-
 5 files changed, 86 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index b03728e18e..dc8627c1b5 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -333,21 +333,24 @@ void unmap_domain_page_global(const void *ptr)
 mfn_t domain_page_map_to_mfn(const void *ptr)
 {
     unsigned long va = (unsigned long)ptr;
-    const l1_pgentry_t *pl1e;
+    l1_pgentry_t l1e;
 
     if ( va >= DIRECTMAP_VIRT_START )
         return _mfn(virt_to_mfn(ptr));
 
     if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
     {
-        pl1e = virt_to_xen_l1e(va);
+        const l1_pgentry_t *pl1e = virt_to_xen_l1e(va);
+
         BUG_ON(!pl1e);
+        l1e = *pl1e;
+        unmap_domain_page(pl1e);
     }
     else
     {
         ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
-        pl1e = &__linear_l1_table[l1_linear_offset(va)];
+        l1e = __linear_l1_table[l1_linear_offset(va)];
     }
 
-    return l1e_get_mfn(*pl1e);
+    return l1e_get_mfn(l1e);
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 462682ba70..b67ecf4107 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn)
         free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
 }
 
+void *alloc_map_clear_xen_pt(mfn_t *pmfn)
+{
+    mfn_t mfn = alloc_xen_pagetable_new();
+    void *ret;
+
+    if ( mfn_eq(mfn, INVALID_MFN) )
+        return NULL;
+
+    if ( pmfn )
+        *pmfn = mfn;
+    ret = map_domain_page(mfn);
+    clear_page(ret);
+
+    return ret;
+}
+
 static DEFINE_SPINLOCK(map_pgdir_lock);
 
+/*
+ * For virt_to_xen_lXe() functions, they take a virtual address and return a
+ * pointer to Xen's LX entry. Caller needs to unmap the pointer.
+ */
 static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
 {
     l4_pgentry_t *pl4e;
@@ -4958,33 +4978,33 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
     if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l3_pgentry_t *l3t = alloc_xen_pagetable();
+        mfn_t l3mfn;
+        l3_pgentry_t *l3t = alloc_map_clear_xen_pt(&l3mfn);
 
         if ( !l3t )
             return NULL;
-        clear_page(l3t);
+        UNMAP_DOMAIN_PAGE(l3t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
-            l4_pgentry_t l4e = l4e_from_paddr(__pa(l3t), __PAGE_HYPERVISOR);
+            l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
 
             l4e_write(pl4e, l4e);
             efi_update_l4_pgtable(l4_table_offset(v), l4e);
-            l3t = NULL;
+            l3mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l3t )
-            free_xen_pagetable(l3t);
+        free_xen_pagetable_new(l3mfn);
     }
 
-    return l4e_to_l3e(*pl4e) + l3_table_offset(v);
+    return map_l3t_from_l4e(*pl4e) + l3_table_offset(v);
 }
 
 static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
 {
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e, l3e;
 
     pl3e = virt_to_xen_l3e(v);
     if ( !pl3e )
@@ -4993,31 +5013,37 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l2_pgentry_t *l2t = alloc_xen_pagetable();
+        mfn_t l2mfn;
+        l2_pgentry_t *l2t = alloc_map_clear_xen_pt(&l2mfn);
 
         if ( !l2t )
+        {
+            unmap_domain_page(pl3e);
             return NULL;
-        clear_page(l2t);
+        }
+        UNMAP_DOMAIN_PAGE(l2t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
-            l3e_write(pl3e, l3e_from_paddr(__pa(l2t), __PAGE_HYPERVISOR));
-            l2t = NULL;
+            l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+            l2mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l2t )
-            free_xen_pagetable(l2t);
+        free_xen_pagetable_new(l2mfn);
     }
 
     BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
-    return l3e_to_l2e(*pl3e) + l2_table_offset(v);
+    l3e = *pl3e;
+    unmap_domain_page(pl3e);
+
+    return map_l2t_from_l3e(l3e) + l2_table_offset(v);
 }
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
 {
-    l2_pgentry_t *pl2e;
+    l2_pgentry_t *pl2e, l2e;
 
     pl2e = virt_to_xen_l2e(v);
     if ( !pl2e )
@@ -5026,26 +5052,32 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l1_pgentry_t *l1t = alloc_xen_pagetable();
+        mfn_t l1mfn;
+        l1_pgentry_t *l1t = alloc_map_clear_xen_pt(&l1mfn);
 
         if ( !l1t )
+        {
+            unmap_domain_page(pl2e);
             return NULL;
-        clear_page(l1t);
+        }
+        UNMAP_DOMAIN_PAGE(l1t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
-            l2e_write(pl2e, l2e_from_paddr(__pa(l1t), __PAGE_HYPERVISOR));
-            l1t = NULL;
+            l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
+            l1mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l1t )
-            free_xen_pagetable(l1t);
+        free_xen_pagetable_new(l1mfn);
     }
 
     BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
-    return l2e_to_l1e(*pl2e) + l1_table_offset(v);
+    l2e = *pl2e;
+    unmap_domain_page(pl2e);
+
+    return map_l1t_from_l2e(l2e) + l1_table_offset(v);
 }
 
 /* Convert to from superpage-mapping flags for map_pages_to_xen(). */
@@ -5068,8 +5100,8 @@ int map_pages_to_xen(
     unsigned int flags)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e, ol3e;
-    l2_pgentry_t *pl2e, ol2e;
+    l3_pgentry_t *pl3e = NULL, ol3e;
+    l2_pgentry_t *pl2e = NULL, ol2e;
     l1_pgentry_t *pl1e, ol1e;
     unsigned int  i;
     int rc = -ENOMEM;
@@ -5090,6 +5122,10 @@ int map_pages_to_xen(
 
     while ( nr_mfns != 0 )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+        UNMAP_DOMAIN_PAGE(pl2e);
+
         pl3e = virt_to_xen_l3e(virt);
 
         if ( !pl3e )
@@ -5258,6 +5294,8 @@ int map_pages_to_xen(
                 pl1e = virt_to_xen_l1e(virt);
                 if ( pl1e == NULL )
                     goto out;
+
+                UNMAP_DOMAIN_PAGE(pl1e);
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
@@ -5434,6 +5472,8 @@ int map_pages_to_xen(
     rc = 0;
 
  out:
+    unmap_domain_page(pl2e);
+    unmap_domain_page(pl3e);
     return rc;
 }
 
@@ -5457,7 +5497,7 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e = NULL;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
     unsigned int  i;
@@ -5473,6 +5513,9 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
     while ( v < e )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+
         pl3e = virt_to_xen_l3e(v);
 
         if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
@@ -5701,6 +5744,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     rc = 0;
 
  out:
+    unmap_domain_page(pl3e);
     return rc;
 }
 
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index faebc1ddf1..9964ab2096 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -1,6 +1,7 @@
 #ifdef VMAP_VIRT_START
 #include <xen/bitmap.h>
 #include <xen/cache.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 3d3f9d49ac..42d1a78731 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -587,6 +587,7 @@ void *alloc_xen_pagetable(void);
 void free_xen_pagetable(void *v);
 mfn_t alloc_xen_pagetable_new(void);
 void free_xen_pagetable_new(mfn_t mfn);
+void *alloc_map_clear_xen_pt(mfn_t *pmfn);
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v);
 
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 5acf3d3d5a..3854feb3ea 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
 #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
 #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
 #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
-#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
+
+#define vmap_to_mfn(va) ({                                                  \
+        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
+        mfn_t mfn_ = l1e_get_mfn(*pl1e_);                                   \
+        unmap_domain_page(pl1e_);                                           \
+        mfn_; })
+
 #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
 #endif /* !defined(__ASSEMBLY__) */
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (2 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 11:52   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Page tables allocated in that function should be mapped and unmapped
now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/arch/x86/mm.c | 60 ++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b67ecf4107..9cb1c6b347 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5154,7 +5154,7 @@ int map_pages_to_xen(
                 }
                 else
                 {
-                    l2_pgentry_t *l2t = l3e_to_l2e(ol3e);
+                    l2_pgentry_t *l2t = map_l2t_from_l3e(ol3e);
 
                     for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                     {
@@ -5166,10 +5166,11 @@ int map_pages_to_xen(
                         else
                         {
                             unsigned int j;
-                            const l1_pgentry_t *l1t = l2e_to_l1e(ol2e);
+                            const l1_pgentry_t *l1t = map_l1t_from_l2e(ol2e);
 
                             for ( j = 0; j < L1_PAGETABLE_ENTRIES; j++ )
                                 flush_flags(l1e_get_flags(l1t[j]));
+                            unmap_domain_page(l1t);
                         }
                     }
                     flush_area(virt, flush_flags);
@@ -5178,9 +5179,10 @@ int map_pages_to_xen(
                         ol2e = l2t[i];
                         if ( (l2e_get_flags(ol2e) & _PAGE_PRESENT) &&
                              !(l2e_get_flags(ol2e) & _PAGE_PSE) )
-                            free_xen_pagetable(l2e_to_l1e(ol2e));
+                            free_xen_pagetable_new(l2e_get_mfn(ol2e));
                     }
-                    free_xen_pagetable(l2t);
+                    unmap_domain_page(l2t);
+                    free_xen_pagetable_new(l3e_get_mfn(ol3e));
                 }
             }
 
@@ -5197,6 +5199,7 @@ int map_pages_to_xen(
             unsigned int flush_flags =
                 FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
             l2_pgentry_t *l2t;
+            mfn_t l2mfn;
 
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
@@ -5218,15 +5221,17 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2t = alloc_xen_pagetable();
-            if ( l2t == NULL )
+            l2mfn = alloc_xen_pagetable_new();
+            if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(ol3e) +
                                        (i << PAGETABLE_ORDER),
                                        l3e_get_flags(ol3e)));
+            UNMAP_DOMAIN_PAGE(l2t);
 
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
@@ -5236,15 +5241,15 @@ int map_pages_to_xen(
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
-                                                    __PAGE_HYPERVISOR));
-                l2t = NULL;
+                l3e_write_atomic(pl3e,
+                                 l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+                l2mfn = INVALID_MFN;
             }
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
             flush_area(virt, flush_flags);
-            if ( l2t )
-                free_xen_pagetable(l2t);
+
+            free_xen_pagetable_new(l2mfn);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5272,12 +5277,13 @@ int map_pages_to_xen(
                 }
                 else
                 {
-                    l1_pgentry_t *l1t = l2e_to_l1e(ol2e);
+                    l1_pgentry_t *l1t = map_l1t_from_l2e(ol2e);
 
                     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                         flush_flags(l1e_get_flags(l1t[i]));
                     flush_area(virt, flush_flags);
-                    free_xen_pagetable(l1t);
+                    unmap_domain_page(l1t);
+                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
                 }
             }
 
@@ -5302,6 +5308,7 @@ int map_pages_to_xen(
                 unsigned int flush_flags =
                     FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
                 l1_pgentry_t *l1t;
+                mfn_t l1mfn;
 
                 /* Skip this PTE if there is no change. */
                 if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) +
@@ -5321,14 +5328,16 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1t = alloc_xen_pagetable();
-                if ( l1t == NULL )
+                l1mfn = alloc_xen_pagetable_new();
+                if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
+                l1t = map_domain_page(l1mfn);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
                                            lNf_to_l1f(l2e_get_flags(*pl2e))));
+                UNMAP_DOMAIN_PAGE(l1t);
 
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
@@ -5338,20 +5347,21 @@ int map_pages_to_xen(
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(l1t),
+                    l2e_write_atomic(pl2e, l2e_from_mfn(l1mfn,
                                                         __PAGE_HYPERVISOR));
-                    l1t = NULL;
+                    l1mfn = INVALID_MFN;
                 }
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(virt, flush_flags);
-                if ( l1t )
-                    free_xen_pagetable(l1t);
+
+                free_xen_pagetable_new(l1mfn);
             }
 
-            pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);
+            pl1e  = map_l1t_from_l2e(*pl2e) + l1_table_offset(virt);
             ol1e  = *pl1e;
             l1e_write_atomic(pl1e, l1e_from_mfn(mfn, flags));
+            UNMAP_DOMAIN_PAGE(pl1e);
             if ( (l1e_get_flags(ol1e) & _PAGE_PRESENT) )
             {
                 unsigned int flush_flags = FLUSH_TLB | FLUSH_ORDER(0);
@@ -5395,12 +5405,13 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1t = l2e_to_l1e(ol2e);
+                l1t = map_l1t_from_l2e(ol2e);
                 base_mfn = l1e_get_pfn(l1t[0]) & ~(L1_PAGETABLE_ENTRIES - 1);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     if ( (l1e_get_pfn(l1t[i]) != (base_mfn + i)) ||
                          (l1e_get_flags(l1t[i]) != flags) )
                         break;
+                UNMAP_DOMAIN_PAGE(l1t);
                 if ( i == L1_PAGETABLE_ENTRIES )
                 {
                     l2e_write_atomic(pl2e, l2e_from_pfn(base_mfn,
@@ -5410,7 +5421,7 @@ int map_pages_to_xen(
                     flush_area(virt - PAGE_SIZE,
                                FLUSH_TLB_GLOBAL |
                                FLUSH_ORDER(PAGETABLE_ORDER));
-                    free_xen_pagetable(l2e_to_l1e(ol2e));
+                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
                 }
                 else if ( locking )
                     spin_unlock(&map_pgdir_lock);
@@ -5443,7 +5454,7 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2t = l3e_to_l2e(ol3e);
+            l2t = map_l2t_from_l3e(ol3e);
             base_mfn = l2e_get_pfn(l2t[0]) & ~(L2_PAGETABLE_ENTRIES *
                                               L1_PAGETABLE_ENTRIES - 1);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
@@ -5451,6 +5462,7 @@ int map_pages_to_xen(
                       (base_mfn + (i << PAGETABLE_ORDER))) ||
                      (l2e_get_flags(l2t[i]) != l1f_to_lNf(flags)) )
                     break;
+            UNMAP_DOMAIN_PAGE(l2t);
             if ( i == L2_PAGETABLE_ENTRIES )
             {
                 l3e_write_atomic(pl3e, l3e_from_pfn(base_mfn,
@@ -5460,7 +5472,7 @@ int map_pages_to_xen(
                 flush_area(virt - PAGE_SIZE,
                            FLUSH_TLB_GLOBAL |
                            FLUSH_ORDER(2*PAGETABLE_ORDER));
-                free_xen_pagetable(l3e_to_l2e(ol3e));
+                free_xen_pagetable_new(l3e_get_mfn(ol3e));
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (3 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 11:55   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Page tables allocated in that function should be mapped and unmapped
now.

Note that pl2e now maybe mapped and unmapped in different iterations, so
we need to add clean-ups for that.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- use normal unmap in the error path.
---
 xen/arch/x86/mm.c | 57 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9cb1c6b347..26694e2f30 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5510,7 +5510,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
     l3_pgentry_t *pl3e = NULL;
-    l2_pgentry_t *pl2e;
+    l2_pgentry_t *pl2e = NULL;
     l1_pgentry_t *pl1e;
     unsigned int  i;
     unsigned long v = s;
@@ -5526,6 +5526,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     while ( v < e )
     {
         /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl2e);
         UNMAP_DOMAIN_PAGE(pl3e);
 
         pl3e = virt_to_xen_l3e(v);
@@ -5543,6 +5544,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
         {
             l2_pgentry_t *l2t;
+            mfn_t l2mfn;
 
             if ( l2_table_offset(v) == 0 &&
                  l1_table_offset(v) == 0 &&
@@ -5559,35 +5561,38 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            l2t = alloc_xen_pagetable();
-            if ( !l2t )
+            l2mfn = alloc_xen_pagetable_new();
+            if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
                                        (i << PAGETABLE_ORDER),
                                        l3e_get_flags(*pl3e)));
+            UNMAP_DOMAIN_PAGE(l2t);
+
             if ( locking )
                 spin_lock(&map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
-                                                    __PAGE_HYPERVISOR));
-                l2t = NULL;
+                l3e_write_atomic(pl3e,
+                                 l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+                l2mfn = INVALID_MFN;
             }
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
-            if ( l2t )
-                free_xen_pagetable(l2t);
+
+            free_xen_pagetable_new(l2mfn);
         }
 
         /*
          * The L3 entry has been verified to be present, and we've dealt with
          * 1G pages as well, so the L2 table cannot require allocation.
          */
-        pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(v);
+        pl2e = map_l2t_from_l3e(*pl3e) + l2_table_offset(v);
 
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
@@ -5615,41 +5620,45 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             else
             {
                 l1_pgentry_t *l1t;
-
                 /* PSE: shatter the superpage and try again. */
-                l1t = alloc_xen_pagetable();
-                if ( !l1t )
+                mfn_t l1mfn = alloc_xen_pagetable_new();
+
+                if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
+                l1t = map_domain_page(l1mfn);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
+                UNMAP_DOMAIN_PAGE(l1t);
+
                 if ( locking )
                     spin_lock(&map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(l1t),
+                    l2e_write_atomic(pl2e, l2e_from_mfn(l1mfn,
                                                         __PAGE_HYPERVISOR));
-                    l1t = NULL;
+                    l1mfn = INVALID_MFN;
                 }
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
-                if ( l1t )
-                    free_xen_pagetable(l1t);
+
+                free_xen_pagetable_new(l1mfn);
             }
         }
         else
         {
             l1_pgentry_t nl1e, *l1t;
+            mfn_t l1mfn;
 
             /*
              * Ordinary 4kB mapping: The L2 entry has been verified to be
              * present, and we've dealt with 2M pages as well, so the L1 table
              * cannot require allocation.
              */
-            pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(v);
+            pl1e = map_l1t_from_l2e(*pl2e) + l1_table_offset(v);
 
             /* Confirm the caller isn't trying to create new mappings. */
             if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) )
@@ -5660,6 +5669,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                (l1e_get_flags(*pl1e) & ~FLAGS_MASK) | nf);
 
             l1e_write_atomic(pl1e, nl1e);
+            UNMAP_DOMAIN_PAGE(pl1e);
             v += PAGE_SIZE;
 
             /*
@@ -5689,10 +5699,12 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 continue;
             }
 
-            l1t = l2e_to_l1e(*pl2e);
+            l1mfn = l2e_get_mfn(*pl2e);
+            l1t = map_domain_page(l1mfn);
             for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                 if ( l1e_get_intpte(l1t[i]) != 0 )
                     break;
+            UNMAP_DOMAIN_PAGE(l1t);
             if ( i == L1_PAGETABLE_ENTRIES )
             {
                 /* Empty: zap the L2E and free the L1 page. */
@@ -5700,7 +5712,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable(l1t);
+                free_xen_pagetable_new(l1mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5731,11 +5743,13 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
         {
             l2_pgentry_t *l2t;
+            mfn_t l2mfn = l3e_get_mfn(*pl3e);
 
-            l2t = l3e_to_l2e(*pl3e);
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 if ( l2e_get_intpte(l2t[i]) != 0 )
                     break;
+            UNMAP_DOMAIN_PAGE(l2t);
             if ( i == L2_PAGETABLE_ENTRIES )
             {
                 /* Empty: zap the L3E and free the L2 page. */
@@ -5743,7 +5757,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable(l2t);
+                free_xen_pagetable_new(l2mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5756,6 +5770,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     rc = 0;
 
  out:
+    unmap_domain_page(pl2e);
     unmap_domain_page(pl3e);
     return rc;
 }
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (4 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 12:02   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon map and unmap pages in paging_init(). Introduce pl2e so
that we can use l2_ro_mpt to point to the page table itself.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>

---
Changed in v7:
- reword commit message.
---
 xen/arch/x86/x86_64/mm.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 102079a801..243014a119 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -479,7 +479,7 @@ void __init paging_init(void)
     unsigned long i, mpt_size, va;
     unsigned int n, memflags;
     l3_pgentry_t *l3_ro_mpt;
-    l2_pgentry_t *l2_ro_mpt = NULL;
+    l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
     struct page_info *l1_pg;
 
     /*
@@ -529,7 +529,7 @@ void __init paging_init(void)
             (L2_PAGETABLE_SHIFT - 3 + PAGE_SHIFT)));
 
         if ( cpu_has_page1gb &&
-             !((unsigned long)l2_ro_mpt & ~PAGE_MASK) &&
+             !((unsigned long)pl2e & ~PAGE_MASK) &&
              (mpt_size >> L3_PAGETABLE_SHIFT) > (i >> PAGETABLE_ORDER) )
         {
             unsigned int k, holes;
@@ -589,7 +589,7 @@ void __init paging_init(void)
             memset((void *)(RDWR_MPT_VIRT_START + (i << L2_PAGETABLE_SHIFT)),
                    0xFF, 1UL << L2_PAGETABLE_SHIFT);
         }
-        if ( !((unsigned long)l2_ro_mpt & ~PAGE_MASK) )
+        if ( !((unsigned long)pl2e & ~PAGE_MASK) )
         {
             if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
                 goto nomem;
@@ -597,13 +597,14 @@ void __init paging_init(void)
             l3e_write(&l3_ro_mpt[l3_table_offset(va)],
                       l3e_from_paddr(__pa(l2_ro_mpt),
                                      __PAGE_HYPERVISOR_RO | _PAGE_USER));
+            pl2e = l2_ro_mpt;
             ASSERT(!l2_table_offset(va));
         }
         /* NB. Cannot be GLOBAL: guest user mode should not see it. */
         if ( l1_pg )
-            l2e_write(l2_ro_mpt, l2e_from_page(
+            l2e_write(pl2e, l2e_from_page(
                 l1_pg, /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
-        l2_ro_mpt++;
+        pl2e++;
     }
 #undef CNT
 #undef MFN
@@ -613,6 +614,7 @@ void __init paging_init(void)
         goto nomem;
     compat_idle_pg_table_l2 = l2_ro_mpt;
     clear_page(l2_ro_mpt);
+    pl2e = l2_ro_mpt;
     /* Allocate and map the compatibility mode machine-to-phys table. */
     mpt_size = (mpt_size >> 1) + (1UL << (L2_PAGETABLE_SHIFT - 1));
     if ( mpt_size > RDWR_COMPAT_MPT_VIRT_END - RDWR_COMPAT_MPT_VIRT_START )
@@ -625,7 +627,7 @@ void __init paging_init(void)
              sizeof(*compat_machine_to_phys_mapping))
     BUILD_BUG_ON((sizeof(*frame_table) & ~sizeof(*frame_table)) % \
                  sizeof(*compat_machine_to_phys_mapping));
-    for ( i = 0; i < (mpt_size >> L2_PAGETABLE_SHIFT); i++, l2_ro_mpt++ )
+    for ( i = 0; i < (mpt_size >> L2_PAGETABLE_SHIFT); i++, pl2e++ )
     {
         memflags = MEMF_node(phys_to_nid(i <<
             (L2_PAGETABLE_SHIFT - 2 + PAGE_SHIFT)));
@@ -647,7 +649,7 @@ void __init paging_init(void)
                         (i << L2_PAGETABLE_SHIFT)),
                0xFF, 1UL << L2_PAGETABLE_SHIFT);
         /* NB. Cannot be GLOBAL as the ptes get copied into per-VM space. */
-        l2e_write(l2_ro_mpt, l2e_from_page(l1_pg, _PAGE_PSE|_PAGE_PRESENT));
+        l2e_write(pl2e, l2e_from_page(l1_pg, _PAGE_PSE|_PAGE_PRESENT));
     }
 #undef CNT
 #undef MFN
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 07/15] x86_64/mm: switch to new APIs in paging_init
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (5 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 12:03   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Map and unmap pages instead of relying on the direct map.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- use the new alloc_map_clear_xen_pt() helper.
- move the unmap of pl3t up a bit.
- remove the unmaps in the nomem path.
---
 xen/arch/x86/x86_64/mm.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 243014a119..8877ac7bb7 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -481,6 +481,7 @@ void __init paging_init(void)
     l3_pgentry_t *l3_ro_mpt;
     l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
     struct page_info *l1_pg;
+    mfn_t l3_ro_mpt_mfn, l2_ro_mpt_mfn;
 
     /*
      * We setup the L3s for 1:1 mapping if host support memory hotplug
@@ -493,22 +494,23 @@ void __init paging_init(void)
         if ( !(l4e_get_flags(idle_pg_table[l4_table_offset(va)]) &
               _PAGE_PRESENT) )
         {
-            l3_pgentry_t *pl3t = alloc_xen_pagetable();
+            mfn_t l3mfn;
+            l3_pgentry_t *pl3t = alloc_map_clear_xen_pt(&l3mfn);
 
             if ( !pl3t )
                 goto nomem;
-            clear_page(pl3t);
+            UNMAP_DOMAIN_PAGE(pl3t);
             l4e_write(&idle_pg_table[l4_table_offset(va)],
-                      l4e_from_paddr(__pa(pl3t), __PAGE_HYPERVISOR_RW));
+                      l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR_RW));
         }
     }
 
     /* Create user-accessible L2 directory to map the MPT for guests. */
-    if ( (l3_ro_mpt = alloc_xen_pagetable()) == NULL )
+    l3_ro_mpt = alloc_map_clear_xen_pt(&l3_ro_mpt_mfn);
+    if ( !l3_ro_mpt )
         goto nomem;
-    clear_page(l3_ro_mpt);
     l4e_write(&idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)],
-              l4e_from_paddr(__pa(l3_ro_mpt), __PAGE_HYPERVISOR_RO | _PAGE_USER));
+              l4e_from_mfn(l3_ro_mpt_mfn, __PAGE_HYPERVISOR_RO | _PAGE_USER));
 
     /*
      * Allocate and map the machine-to-phys table.
@@ -591,12 +593,15 @@ void __init paging_init(void)
         }
         if ( !((unsigned long)pl2e & ~PAGE_MASK) )
         {
-            if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
+            UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+
+            l2_ro_mpt = alloc_map_clear_xen_pt(&l2_ro_mpt_mfn);
+            if ( !l2_ro_mpt )
                 goto nomem;
-            clear_page(l2_ro_mpt);
+
             l3e_write(&l3_ro_mpt[l3_table_offset(va)],
-                      l3e_from_paddr(__pa(l2_ro_mpt),
-                                     __PAGE_HYPERVISOR_RO | _PAGE_USER));
+                      l3e_from_mfn(l2_ro_mpt_mfn,
+                                   __PAGE_HYPERVISOR_RO | _PAGE_USER));
             pl2e = l2_ro_mpt;
             ASSERT(!l2_table_offset(va));
         }
@@ -608,13 +613,16 @@ void __init paging_init(void)
     }
 #undef CNT
 #undef MFN
+    UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+    UNMAP_DOMAIN_PAGE(l3_ro_mpt);
 
     /* Create user-accessible L2 directory to map the MPT for compat guests. */
-    if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
+    l2_ro_mpt_mfn = alloc_xen_pagetable_new();
+    if ( mfn_eq(l2_ro_mpt_mfn, INVALID_MFN) )
         goto nomem;
-    compat_idle_pg_table_l2 = l2_ro_mpt;
-    clear_page(l2_ro_mpt);
-    pl2e = l2_ro_mpt;
+    compat_idle_pg_table_l2 = map_domain_page_global(l2_ro_mpt_mfn);
+    clear_page(compat_idle_pg_table_l2);
+    pl2e = compat_idle_pg_table_l2;
     /* Allocate and map the compatibility mode machine-to-phys table. */
     mpt_size = (mpt_size >> 1) + (1UL << (L2_PAGETABLE_SHIFT - 1));
     if ( mpt_size > RDWR_COMPAT_MPT_VIRT_END - RDWR_COMPAT_MPT_VIRT_START )
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (6 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 12:32   ` Jan Beulich
  2020-07-14 12:33   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Avoid repetitive mapping of l2_ro_mpt by keeping it across loops, and
only unmap and map it when crossing 1G boundaries.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- avoid repetitive mapping of l2_ro_mpt.
- edit commit message.
- switch to alloc_map_clear_xen_pt().
---
 xen/arch/x86/x86_64/mm.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 8877ac7bb7..cfc3de9091 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -385,7 +385,8 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
     ASSERT(l4e_get_flags(idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)])
             & _PAGE_PRESENT);
-    l3_ro_mpt = l4e_to_l3e(idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]);
+    l3_ro_mpt = map_l3t_from_l4e(
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]);
 
     smap = (info->spfn & (~((1UL << (L2_PAGETABLE_SHIFT - 3)) -1)));
     emap = ((info->epfn + ((1UL << (L2_PAGETABLE_SHIFT - 3)) - 1 )) &
@@ -403,6 +404,10 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
     i = smap;
     while ( i < emap )
     {
+        if ( (RO_MPT_VIRT_START + i * sizeof(*machine_to_phys_mapping)) &
+             ((1UL << L3_PAGETABLE_SHIFT) - 1) )
+            UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+
         switch ( m2p_mapped(i) )
         {
         case M2P_1G_MAPPED:
@@ -438,32 +443,29 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
             ASSERT(!(l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
                   _PAGE_PSE));
-            if ( l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
-              _PAGE_PRESENT )
-                l2_ro_mpt = l3e_to_l2e(l3_ro_mpt[l3_table_offset(va)]) +
-                  l2_table_offset(va);
-            else
+            if ( (l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
+                  _PAGE_PRESENT) && !l2_ro_mpt)
+                l2_ro_mpt = map_l2t_from_l3e(l3_ro_mpt[l3_table_offset(va)]);
+            else if ( !l2_ro_mpt )
             {
-                l2_ro_mpt = alloc_xen_pagetable();
+                mfn_t l2_ro_mpt_mfn;
+
+                l2_ro_mpt = alloc_map_clear_xen_pt(&l2_ro_mpt_mfn);
                 if ( !l2_ro_mpt )
                 {
                     ret = -ENOMEM;
                     goto error;
                 }
 
-                clear_page(l2_ro_mpt);
                 l3e_write(&l3_ro_mpt[l3_table_offset(va)],
-                          l3e_from_paddr(__pa(l2_ro_mpt),
-                                         __PAGE_HYPERVISOR_RO | _PAGE_USER));
-                l2_ro_mpt += l2_table_offset(va);
+                          l3e_from_mfn(l2_ro_mpt_mfn,
+                                       __PAGE_HYPERVISOR_RO | _PAGE_USER));
             }
 
             /* NB. Cannot be GLOBAL: guest user mode should not see it. */
-            l2e_write(l2_ro_mpt, l2e_from_mfn(mfn,
+            l2e_write(&l2_ro_mpt[l2_table_offset(va)], l2e_from_mfn(mfn,
                    /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
         }
-        if ( !((unsigned long)l2_ro_mpt & ~PAGE_MASK) )
-            l2_ro_mpt = NULL;
         i += ( 1UL << (L2_PAGETABLE_SHIFT - 3));
     }
 #undef CNT
@@ -471,6 +473,8 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
     ret = setup_compat_m2p_table(info);
 error:
+    unmap_domain_page(l2_ro_mpt);
+    unmap_domain_page(l3_ro_mpt);
     return ret;
 }
 
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 09/15] efi: use new page table APIs in copy_mapping
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (7 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-14 12:42   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 10/15] efi: switch to new APIs in EFI code Hongyan Xia
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel; +Cc: julien, Jan Beulich

From: Wei Liu <wei.liu2@citrix.com>

After inspection ARM doesn't have alloc_xen_pagetable so this function
is x86 only, which means it is safe for us to change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- hoist l3 variables out of the loop to avoid repetitive mappings.
---
 xen/common/efi/boot.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index a6f84c945a..2599ae50d2 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -6,6 +6,7 @@
 #include <xen/compile.h>
 #include <xen/ctype.h>
 #include <xen/dmi.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned long mfn, unsigned long end,
                                                  unsigned long emfn))
 {
     unsigned long next;
+    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
 
     for ( ; mfn < end; mfn = next )
     {
         l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)];
-        l3_pgentry_t *l3src, *l3dst;
         unsigned long va = (unsigned long)mfn_to_virt(mfn);
 
+        if ( !((mfn << PAGE_SHIFT) & ((1UL << L4_PAGETABLE_SHIFT) - 1)) )
+        {
+            UNMAP_DOMAIN_PAGE(l3src);
+            UNMAP_DOMAIN_PAGE(l3dst);
+        }
         next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));
         if ( !is_valid(mfn, min(next, end)) )
             continue;
-        if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        if ( !l3dst )
         {
-            l3dst = alloc_xen_pagetable();
-            BUG_ON(!l3dst);
-            clear_page(l3dst);
-            efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
-                l4e_from_paddr(virt_to_maddr(l3dst), __PAGE_HYPERVISOR);
+            if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+            {
+                mfn_t l3mfn;
+
+                l3dst = alloc_map_clear_xen_pt(&l3mfn);
+                BUG_ON(!l3dst);
+                efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
+                    l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
+            }
+            else
+                l3dst = map_l3t_from_l4e(l4e);
         }
-        else
-            l3dst = l4e_to_l3e(l4e);
-        l3src = l4e_to_l3e(idle_pg_table[l4_table_offset(va)]);
+        if ( !l3src )
+            l3src = map_l3t_from_l4e(idle_pg_table[l4_table_offset(va)]);
         l3dst[l3_table_offset(mfn << PAGE_SHIFT)] = l3src[l3_table_offset(va)];
     }
+
+    unmap_domain_page(l3src);
+    unmap_domain_page(l3dst);
 }
 
 static bool __init ram_range_valid(unsigned long smfn, unsigned long emfn)
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 10/15] efi: switch to new APIs in EFI code
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (8 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-15 13:06   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- add blank line after declaration.
- rename efi_l4_pgtable into efi_l4t.
- pass the mapped efi_l4t to copy_mapping() instead of map it again.
- use the alloc_map_clear_xen_pt() API.
- unmap pl3e, pl2e, l1t earlier.
---
 xen/arch/x86/efi/runtime.h | 13 ++++++---
 xen/common/efi/boot.c      | 55 ++++++++++++++++++++++----------------
 xen/common/efi/efi.h       |  3 ++-
 xen/common/efi/runtime.c   |  8 +++---
 4 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/efi/runtime.h b/xen/arch/x86/efi/runtime.h
index d9eb8f5c27..77866c5f21 100644
--- a/xen/arch/x86/efi/runtime.h
+++ b/xen/arch/x86/efi/runtime.h
@@ -1,12 +1,19 @@
+#include <xen/domain_page.h>
+#include <xen/mm.h>
 #include <asm/atomic.h>
 #include <asm/mc146818rtc.h>
 
 #ifndef COMPAT
-l4_pgentry_t *__read_mostly efi_l4_pgtable;
+mfn_t __read_mostly efi_l4_mfn = INVALID_MFN_INITIALIZER;
 
 void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e)
 {
-    if ( efi_l4_pgtable )
-        l4e_write(efi_l4_pgtable + l4idx, l4e);
+    if ( !mfn_eq(efi_l4_mfn, INVALID_MFN) )
+    {
+        l4_pgentry_t *efi_l4t = map_domain_page(efi_l4_mfn);
+
+        l4e_write(efi_l4t + l4idx, l4e);
+        unmap_domain_page(efi_l4t);
+    }
 }
 #endif
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 2599ae50d2..74e23fbac8 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1440,14 +1440,15 @@ custom_param("efi", parse_efi_param);
 
 static __init void copy_mapping(unsigned long mfn, unsigned long end,
                                 bool (*is_valid)(unsigned long smfn,
-                                                 unsigned long emfn))
+                                                 unsigned long emfn),
+                                l4_pgentry_t *efi_l4t)
 {
     unsigned long next;
     l3_pgentry_t *l3src = NULL, *l3dst = NULL;
 
     for ( ; mfn < end; mfn = next )
     {
-        l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)];
+        l4_pgentry_t l4e = efi_l4t[l4_table_offset(mfn << PAGE_SHIFT)];
         unsigned long va = (unsigned long)mfn_to_virt(mfn);
 
         if ( !((mfn << PAGE_SHIFT) & ((1UL << L4_PAGETABLE_SHIFT) - 1)) )
@@ -1466,7 +1467,7 @@ static __init void copy_mapping(unsigned long mfn, unsigned long end,
 
                 l3dst = alloc_map_clear_xen_pt(&l3mfn);
                 BUG_ON(!l3dst);
-                efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
+                efi_l4t[l4_table_offset(mfn << PAGE_SHIFT)] =
                     l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
             }
             else
@@ -1499,6 +1500,7 @@ static bool __init rt_range_valid(unsigned long smfn, unsigned long emfn)
 void __init efi_init_memory(void)
 {
     unsigned int i;
+    l4_pgentry_t *efi_l4t;
     struct rt_extra {
         struct rt_extra *next;
         unsigned long smfn, emfn;
@@ -1613,11 +1615,10 @@ void __init efi_init_memory(void)
      * Set up 1:1 page tables for runtime calls. See SetVirtualAddressMap() in
      * efi_exit_boot().
      */
-    efi_l4_pgtable = alloc_xen_pagetable();
-    BUG_ON(!efi_l4_pgtable);
-    clear_page(efi_l4_pgtable);
+    efi_l4t = alloc_map_clear_xen_pt(&efi_l4_mfn);
+    BUG_ON(!efi_l4t);
 
-    copy_mapping(0, max_page, ram_range_valid);
+    copy_mapping(0, max_page, ram_range_valid, efi_l4t);
 
     /* Insert non-RAM runtime mappings inside the direct map. */
     for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size )
@@ -1633,58 +1634,64 @@ void __init efi_init_memory(void)
             copy_mapping(PFN_DOWN(desc->PhysicalStart),
                          PFN_UP(desc->PhysicalStart +
                                 (desc->NumberOfPages << EFI_PAGE_SHIFT)),
-                         rt_range_valid);
+                         rt_range_valid, efi_l4t);
     }
 
     /* Insert non-RAM runtime mappings outside of the direct map. */
     while ( (extra = extra_head) != NULL )
     {
         unsigned long addr = extra->smfn << PAGE_SHIFT;
-        l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(addr)];
+        l4_pgentry_t l4e = efi_l4t[l4_table_offset(addr)];
         l3_pgentry_t *pl3e;
         l2_pgentry_t *pl2e;
         l1_pgentry_t *l1t;
 
         if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
         {
-            pl3e = alloc_xen_pagetable();
+            mfn_t l3mfn;
+
+            pl3e = alloc_map_clear_xen_pt(&l3mfn);
             BUG_ON(!pl3e);
-            clear_page(pl3e);
-            efi_l4_pgtable[l4_table_offset(addr)] =
-                l4e_from_paddr(virt_to_maddr(pl3e), __PAGE_HYPERVISOR);
+            efi_l4t[l4_table_offset(addr)] =
+                l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
         }
         else
-            pl3e = l4e_to_l3e(l4e);
+            pl3e = map_l3t_from_l4e(l4e);
         pl3e += l3_table_offset(addr);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
-            pl2e = alloc_xen_pagetable();
+            mfn_t l2mfn;
+
+            pl2e = alloc_map_clear_xen_pt(&l2mfn);
             BUG_ON(!pl2e);
-            clear_page(pl2e);
-            *pl3e = l3e_from_paddr(virt_to_maddr(pl2e), __PAGE_HYPERVISOR);
+            *pl3e = l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR);
         }
         else
         {
             BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
-            pl2e = l3e_to_l2e(*pl3e);
+            pl2e = map_l2t_from_l3e(*pl3e);
         }
+        UNMAP_DOMAIN_PAGE(pl3e);
         pl2e += l2_table_offset(addr);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
-            l1t = alloc_xen_pagetable();
+            mfn_t l1mfn;
+
+            l1t = alloc_map_clear_xen_pt(&l1mfn);
             BUG_ON(!l1t);
-            clear_page(l1t);
-            *pl2e = l2e_from_paddr(virt_to_maddr(l1t), __PAGE_HYPERVISOR);
+            *pl2e = l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR);
         }
         else
         {
             BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
-            l1t = l2e_to_l1e(*pl2e);
+            l1t = map_l1t_from_l2e(*pl2e);
         }
+        UNMAP_DOMAIN_PAGE(pl2e);
         for ( i = l1_table_offset(addr);
               i < L1_PAGETABLE_ENTRIES && extra->smfn < extra->emfn;
               ++i, ++extra->smfn )
             l1t[i] = l1e_from_pfn(extra->smfn, extra->prot);
+        UNMAP_DOMAIN_PAGE(l1t);
 
         if ( extra->smfn == extra->emfn )
         {
@@ -1696,6 +1703,8 @@ void __init efi_init_memory(void)
     /* Insert Xen mappings. */
     for ( i = l4_table_offset(HYPERVISOR_VIRT_START);
           i < l4_table_offset(DIRECTMAP_VIRT_END); ++i )
-        efi_l4_pgtable[i] = idle_pg_table[i];
+        efi_l4t[i] = idle_pg_table[i];
+
+    unmap_domain_page(efi_l4t);
 }
 #endif
diff --git a/xen/common/efi/efi.h b/xen/common/efi/efi.h
index 2e38d05f3d..e364bae3e0 100644
--- a/xen/common/efi/efi.h
+++ b/xen/common/efi/efi.h
@@ -6,6 +6,7 @@
 #include <efi/eficapsule.h>
 #include <efi/efiapi.h>
 #include <xen/efi.h>
+#include <xen/mm.h>
 #include <xen/spinlock.h>
 #include <asm/page.h>
 
@@ -29,7 +30,7 @@ extern UINTN efi_memmap_size, efi_mdesc_size;
 extern void *efi_memmap;
 
 #ifdef CONFIG_X86
-extern l4_pgentry_t *efi_l4_pgtable;
+extern mfn_t efi_l4_mfn;
 #endif
 
 extern const struct efi_pci_rom *efi_pci_roms;
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 95367694b5..375b94229e 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -85,7 +85,7 @@ struct efi_rs_state efi_rs_enter(void)
     static const u32 mxcsr = MXCSR_DEFAULT;
     struct efi_rs_state state = { .cr3 = 0 };
 
-    if ( !efi_l4_pgtable )
+    if ( mfn_eq(efi_l4_mfn, INVALID_MFN) )
         return state;
 
     state.cr3 = read_cr3();
@@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void)
         lgdt(&gdt_desc);
     }
 
-    switch_cr3_cr4(virt_to_maddr(efi_l4_pgtable), read_cr4());
+    switch_cr3_cr4(mfn_to_maddr(efi_l4_mfn), read_cr4());
 
     return state;
 }
@@ -140,9 +140,9 @@ void efi_rs_leave(struct efi_rs_state *state)
 
 bool efi_rs_using_pgtables(void)
 {
-    return efi_l4_pgtable &&
+    return !mfn_eq(efi_l4_mfn, INVALID_MFN) &&
            (smp_processor_id() == efi_rs_on_cpu) &&
-           (read_cr3() == virt_to_maddr(efi_l4_pgtable));
+           (read_cr3() == mfn_to_maddr(efi_l4_mfn));
 }
 
 unsigned long efi_get_time(void)
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 11/15] x86/smpboot: add exit path for clone_mapping()
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (9 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 10/15] efi: switch to new APIs in EFI code Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon need to clean up page table mappings in the exit path.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- edit commit message.
- begin with rc = 0 and set it to -ENOMEM ahead of if().
---
 xen/arch/x86/smpboot.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 170ab24e66..43abf2a332 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -674,6 +674,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     l3_pgentry_t *pl3e;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
+    int rc = 0;
 
     /*
      * Sanity check 'linear'.  We only allow cloning from the Xen virtual
@@ -714,7 +715,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
             pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
             flags = l1e_get_flags(*pl1e);
             if ( !(flags & _PAGE_PRESENT) )
-                return 0;
+                goto out;
             pfn = l1e_get_pfn(*pl1e);
         }
     }
@@ -722,8 +723,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
     {
         pl3e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl3e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl3e);
         l4e_write(&rpt[root_table_offset(linear)],
                   l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
@@ -736,8 +738,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
         pl2e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl2e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl2e);
         l3e_write(pl3e, l3e_from_paddr(__pa(pl2e), __PAGE_HYPERVISOR));
     }
@@ -752,8 +755,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
         pl1e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl1e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl1e);
         l2e_write(pl2e, l2e_from_paddr(__pa(pl1e), __PAGE_HYPERVISOR));
     }
@@ -774,7 +778,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     else
         l1e_write(pl1e, l1e_from_pfn(pfn, flags));
 
-    return 0;
+    rc = 0;
+ out:
+    return rc;
 }
 
 DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (10 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-15 13:11   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 13/15] x86/mm: drop old page table APIs Hongyan Xia
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- change patch title
- remove initialiser of pl3e.
- combine the initialisation of pl3e into a single assignment.
- use the new alloc_map_clear() helper.
- use the normal map_domain_page() in the error path.
---
 xen/arch/x86/smpboot.c | 44 ++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 43abf2a332..186211ccc9 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -672,8 +672,8 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     unsigned long linear = (unsigned long)ptr, pfn;
     unsigned int flags;
     l3_pgentry_t *pl3e;
-    l2_pgentry_t *pl2e;
-    l1_pgentry_t *pl1e;
+    l2_pgentry_t *pl2e = NULL;
+    l1_pgentry_t *pl1e = NULL;
     int rc = 0;
 
     /*
@@ -688,7 +688,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
          (linear >= XEN_VIRT_END && linear < DIRECTMAP_VIRT_START) )
         return -EINVAL;
 
-    pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
+    pl3e = map_l3t_from_l4e(idle_pg_table[root_table_offset(linear)]) +
         l3_table_offset(linear);
 
     flags = l3e_get_flags(*pl3e);
@@ -701,7 +701,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     }
     else
     {
-        pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(linear);
+        pl2e = map_l2t_from_l3e(*pl3e) + l2_table_offset(linear);
         flags = l2e_get_flags(*pl2e);
         ASSERT(flags & _PAGE_PRESENT);
         if ( flags & _PAGE_PSE )
@@ -712,7 +712,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
         }
         else
         {
-            pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
+            pl1e = map_l1t_from_l2e(*pl2e) + l1_table_offset(linear);
             flags = l1e_get_flags(*pl1e);
             if ( !(flags & _PAGE_PRESENT) )
                 goto out;
@@ -720,51 +720,58 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
         }
     }
 
+    UNMAP_DOMAIN_PAGE(pl1e);
+    UNMAP_DOMAIN_PAGE(pl2e);
+    UNMAP_DOMAIN_PAGE(pl3e);
+
     if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
     {
-        pl3e = alloc_xen_pagetable();
+        mfn_t l3mfn;
+
+        pl3e = alloc_map_clear_xen_pt(&l3mfn);
         rc = -ENOMEM;
         if ( !pl3e )
             goto out;
-        clear_page(pl3e);
         l4e_write(&rpt[root_table_offset(linear)],
-                  l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
+                  l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR));
     }
     else
-        pl3e = l4e_to_l3e(rpt[root_table_offset(linear)]);
+        pl3e = map_l3t_from_l4e(rpt[root_table_offset(linear)]);
 
     pl3e += l3_table_offset(linear);
 
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
-        pl2e = alloc_xen_pagetable();
+        mfn_t l2mfn;
+
+        pl2e = alloc_map_clear_xen_pt(&l2mfn);
         rc = -ENOMEM;
         if ( !pl2e )
             goto out;
-        clear_page(pl2e);
-        l3e_write(pl3e, l3e_from_paddr(__pa(pl2e), __PAGE_HYPERVISOR));
+        l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
     }
     else
     {
         ASSERT(!(l3e_get_flags(*pl3e) & _PAGE_PSE));
-        pl2e = l3e_to_l2e(*pl3e);
+        pl2e = map_l2t_from_l3e(*pl3e);
     }
 
     pl2e += l2_table_offset(linear);
 
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
-        pl1e = alloc_xen_pagetable();
+        mfn_t l1mfn;
+
+        pl1e = alloc_map_clear_xen_pt(&l1mfn);
         rc = -ENOMEM;
         if ( !pl1e )
             goto out;
-        clear_page(pl1e);
-        l2e_write(pl2e, l2e_from_paddr(__pa(pl1e), __PAGE_HYPERVISOR));
+        l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
     }
     else
     {
         ASSERT(!(l2e_get_flags(*pl2e) & _PAGE_PSE));
-        pl1e = l2e_to_l1e(*pl2e);
+        pl1e = map_l1t_from_l2e(*pl2e);
     }
 
     pl1e += l1_table_offset(linear);
@@ -780,6 +787,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
 
     rc = 0;
  out:
+    unmap_domain_page(pl1e);
+    unmap_domain_page(pl2e);
+    unmap_domain_page(pl3e);
     return rc;
 }
 
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 13/15] x86/mm: drop old page table APIs
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (11 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 14/15] x86: switch to use domheap page for page tables Hongyan Xia
  2020-05-29 11:11 ` [PATCH v7 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia
  14 siblings, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

Two sets of old APIs, alloc/free_xen_pagetable() and lXe_to_lYe(), are
now dropped to avoid the dependency on direct map.

There are two special cases which still have not been re-written into
the new APIs, thus need special treatment:

rpt in smpboot.c cannot use ephemeral mappings yet. The problem is that
rpt is read and written in context switch code, but the mapping
infrastructure is NOT context-switch-safe, meaning we cannot map rpt in
one domain and unmap in another. Before the mapping infrastructure
supports context switches, rpt has to be globally mapped.

Also, lXe_to_lYe() during Xen image relocation cannot be converted into
map/unmap pairs. We cannot hold on to mappings while the mapping
infrastructure is being relocated! It is enough to remove the direct map
in the second e820 pass, so we still use the direct map (<4GiB) in Xen
relocation (which is during the first e820 pass).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c          | 14 --------------
 xen/arch/x86/setup.c       |  4 ++--
 xen/arch/x86/smpboot.c     |  4 ++--
 xen/include/asm-x86/mm.h   |  2 --
 xen/include/asm-x86/page.h |  5 -----
 5 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 26694e2f30..38cfa3ce25 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4908,20 +4908,6 @@ int mmcfg_intercept_write(
     return X86EMUL_OKAY;
 }
 
-void *alloc_xen_pagetable(void)
-{
-    mfn_t mfn = alloc_xen_pagetable_new();
-
-    return mfn_eq(mfn, INVALID_MFN) ? NULL : mfn_to_virt(mfn_x(mfn));
-}
-
-void free_xen_pagetable(void *v)
-{
-    mfn_t mfn = v ? virt_to_mfn(v) : INVALID_MFN;
-
-    free_xen_pagetable_new(mfn);
-}
-
 /*
  * For these PTE APIs, the caller must follow the alloc-map-unmap-free
  * lifecycle, which means explicitly mapping the PTE pages before accessing
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 2dec7a3fc6..b08b69ff5d 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1190,7 +1190,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                     continue;
                 *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) +
                                         xen_phys_start);
-                pl3e = l4e_to_l3e(*pl4e);
+                pl3e = __va(l4e_get_paddr(*pl4e));
                 for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ )
                 {
                     /* Not present, 1GB mapping, or already relocated? */
@@ -1200,7 +1200,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                         continue;
                     *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) +
                                             xen_phys_start);
-                    pl2e = l3e_to_l2e(*pl3e);
+                    pl2e = __va(l3e_get_paddr(*pl3e));
                     for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ )
                     {
                         /* Not present, PSE, or already relocated? */
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 186211ccc9..9cdf198fd6 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -808,7 +808,7 @@ static int setup_cpu_root_pgt(unsigned int cpu)
     if ( !opt_xpti_hwdom && !opt_xpti_domu )
         return 0;
 
-    rpt = alloc_xen_pagetable();
+    rpt = alloc_xenheap_page();
     if ( !rpt )
         return -ENOMEM;
 
@@ -912,7 +912,7 @@ static void cleanup_cpu_root_pgt(unsigned int cpu)
         free_xen_pagetable_new(l3mfn);
     }
 
-    free_xen_pagetable(rpt);
+    free_xenheap_page(rpt);
 
     /* Also zap the stub mapping for this CPU. */
     if ( stub_linear )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 42d1a78731..2ac0cdab83 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -583,8 +583,6 @@ int vcpu_destroy_pagetables(struct vcpu *);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 /* Allocator functions for Xen pagetables. */
-void *alloc_xen_pagetable(void);
-void free_xen_pagetable(void *v);
 mfn_t alloc_xen_pagetable_new(void);
 void free_xen_pagetable_new(mfn_t mfn);
 void *alloc_map_clear_xen_pt(mfn_t *pmfn);
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 3854feb3ea..ec66ad8df6 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -188,11 +188,6 @@ static inline l4_pgentry_t l4e_from_paddr(paddr_t pa, unsigned int flags)
 #define l4e_has_changed(x,y,flags) \
     ( !!(((x).l4 ^ (y).l4) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )
 
-/* Pagetable walking. */
-#define l2e_to_l1e(x)              ((l1_pgentry_t *)__va(l2e_get_paddr(x)))
-#define l3e_to_l2e(x)              ((l2_pgentry_t *)__va(l3e_get_paddr(x)))
-#define l4e_to_l3e(x)              ((l3_pgentry_t *)__va(l4e_get_paddr(x)))
-
 #define map_l1t_from_l2e(x)        (l1_pgentry_t *)map_domain_page(l2e_get_mfn(x))
 #define map_l2t_from_l3e(x)        (l2_pgentry_t *)map_domain_page(l3e_get_mfn(x))
 #define map_l3t_from_l4e(x)        (l3_pgentry_t *)map_domain_page(l4e_get_mfn(x))
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 14/15] x86: switch to use domheap page for page tables
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (12 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 13/15] x86/mm: drop old page table APIs Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  2020-07-15 13:16   ` Jan Beulich
  2020-05-29 11:11 ` [PATCH v7 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia
  14 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
 xen/arch/x86/mm.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 38cfa3ce25..16f1aa3344 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4918,10 +4918,11 @@ mfn_t alloc_xen_pagetable_new(void)
 {
     if ( system_state != SYS_STATE_early_boot )
     {
-        void *ptr = alloc_xenheap_page();
 
-        BUG_ON(!hardware_domain && !ptr);
-        return ptr ? virt_to_mfn(ptr) : INVALID_MFN;
+        struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+        BUG_ON(!hardware_domain && !pg);
+        return pg ? page_to_mfn(pg) : INVALID_MFN;
     }
 
     return alloc_boot_pages(1, 1);
@@ -4931,7 +4932,7 @@ mfn_t alloc_xen_pagetable_new(void)
 void free_xen_pagetable_new(mfn_t mfn)
 {
     if ( system_state != SYS_STATE_early_boot && !mfn_eq(mfn, INVALID_MFN) )
-        free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
+        free_domheap_page(mfn_to_page(mfn));
 }
 
 void *alloc_map_clear_xen_pt(mfn_t *pmfn)
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 15/15] x86/mm: drop _new suffix for page table APIs
  2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (13 preceding siblings ...)
  2020-05-29 11:11 ` [PATCH v7 14/15] x86: switch to use domheap page for page tables Hongyan Xia
@ 2020-05-29 11:11 ` Hongyan Xia
  14 siblings, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-05-29 11:11 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, julien, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c        | 44 ++++++++++++++++++++--------------------
 xen/arch/x86/smpboot.c   |  6 +++---
 xen/arch/x86/x86_64/mm.c |  2 +-
 xen/include/asm-x86/mm.h |  4 ++--
 4 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 16f1aa3344..bf35a1a998 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -356,7 +356,7 @@ void __init arch_init_memory(void)
             ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
             if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
             {
-                mfn_t l3mfn = alloc_xen_pagetable_new();
+                mfn_t l3mfn = alloc_xen_pagetable();
 
                 if ( !mfn_eq(l3mfn, INVALID_MFN) )
                 {
@@ -4914,7 +4914,7 @@ int mmcfg_intercept_write(
  * them. The caller must check whether the allocation has succeeded, and only
  * pass valid MFNs to map_domain_page().
  */
-mfn_t alloc_xen_pagetable_new(void)
+mfn_t alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
     {
@@ -4929,7 +4929,7 @@ mfn_t alloc_xen_pagetable_new(void)
 }
 
 /* mfn can be INVALID_MFN */
-void free_xen_pagetable_new(mfn_t mfn)
+void free_xen_pagetable(mfn_t mfn)
 {
     if ( system_state != SYS_STATE_early_boot && !mfn_eq(mfn, INVALID_MFN) )
         free_domheap_page(mfn_to_page(mfn));
@@ -4937,7 +4937,7 @@ void free_xen_pagetable_new(mfn_t mfn)
 
 void *alloc_map_clear_xen_pt(mfn_t *pmfn)
 {
-    mfn_t mfn = alloc_xen_pagetable_new();
+    mfn_t mfn = alloc_xen_pagetable();
     void *ret;
 
     if ( mfn_eq(mfn, INVALID_MFN) )
@@ -4983,7 +4983,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l3mfn);
+        free_xen_pagetable(l3mfn);
     }
 
     return map_l3t_from_l4e(*pl4e) + l3_table_offset(v);
@@ -5018,7 +5018,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l2mfn);
+        free_xen_pagetable(l2mfn);
     }
 
     BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
@@ -5057,7 +5057,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l1mfn);
+        free_xen_pagetable(l1mfn);
     }
 
     BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
@@ -5166,10 +5166,10 @@ int map_pages_to_xen(
                         ol2e = l2t[i];
                         if ( (l2e_get_flags(ol2e) & _PAGE_PRESENT) &&
                              !(l2e_get_flags(ol2e) & _PAGE_PSE) )
-                            free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                            free_xen_pagetable(l2e_get_mfn(ol2e));
                     }
                     unmap_domain_page(l2t);
-                    free_xen_pagetable_new(l3e_get_mfn(ol3e));
+                    free_xen_pagetable(l3e_get_mfn(ol3e));
                 }
             }
 
@@ -5208,7 +5208,7 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2mfn = alloc_xen_pagetable_new();
+            l2mfn = alloc_xen_pagetable();
             if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
@@ -5236,7 +5236,7 @@ int map_pages_to_xen(
                 spin_unlock(&map_pgdir_lock);
             flush_area(virt, flush_flags);
 
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5270,7 +5270,7 @@ int map_pages_to_xen(
                         flush_flags(l1e_get_flags(l1t[i]));
                     flush_area(virt, flush_flags);
                     unmap_domain_page(l1t);
-                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                    free_xen_pagetable(l2e_get_mfn(ol2e));
                 }
             }
 
@@ -5315,7 +5315,7 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1mfn = alloc_xen_pagetable_new();
+                l1mfn = alloc_xen_pagetable();
                 if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
@@ -5342,7 +5342,7 @@ int map_pages_to_xen(
                     spin_unlock(&map_pgdir_lock);
                 flush_area(virt, flush_flags);
 
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
 
             pl1e  = map_l1t_from_l2e(*pl2e) + l1_table_offset(virt);
@@ -5408,7 +5408,7 @@ int map_pages_to_xen(
                     flush_area(virt - PAGE_SIZE,
                                FLUSH_TLB_GLOBAL |
                                FLUSH_ORDER(PAGETABLE_ORDER));
-                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                    free_xen_pagetable(l2e_get_mfn(ol2e));
                 }
                 else if ( locking )
                     spin_unlock(&map_pgdir_lock);
@@ -5459,7 +5459,7 @@ int map_pages_to_xen(
                 flush_area(virt - PAGE_SIZE,
                            FLUSH_TLB_GLOBAL |
                            FLUSH_ORDER(2*PAGETABLE_ORDER));
-                free_xen_pagetable_new(l3e_get_mfn(ol3e));
+                free_xen_pagetable(l3e_get_mfn(ol3e));
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5548,7 +5548,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            l2mfn = alloc_xen_pagetable_new();
+            l2mfn = alloc_xen_pagetable();
             if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
@@ -5572,7 +5572,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
 
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         /*
@@ -5608,7 +5608,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             {
                 l1_pgentry_t *l1t;
                 /* PSE: shatter the superpage and try again. */
-                mfn_t l1mfn = alloc_xen_pagetable_new();
+                mfn_t l1mfn = alloc_xen_pagetable();
 
                 if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
@@ -5632,7 +5632,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
 
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
         }
         else
@@ -5699,7 +5699,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5744,7 +5744,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable_new(l2mfn);
+                free_xen_pagetable(l2mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 9cdf198fd6..a20540e36a 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -901,15 +901,15 @@ static void cleanup_cpu_root_pgt(unsigned int cpu)
                     continue;
 
                 ASSERT(!(l2e_get_flags(l2t[i2]) & _PAGE_PSE));
-                free_xen_pagetable_new(l2e_get_mfn(l2t[i2]));
+                free_xen_pagetable(l2e_get_mfn(l2t[i2]));
             }
 
             unmap_domain_page(l2t);
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         unmap_domain_page(l3t);
-        free_xen_pagetable_new(l3mfn);
+        free_xen_pagetable(l3mfn);
     }
 
     free_xenheap_page(rpt);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index cfc3de9091..91d7b24da8 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -621,7 +621,7 @@ void __init paging_init(void)
     UNMAP_DOMAIN_PAGE(l3_ro_mpt);
 
     /* Create user-accessible L2 directory to map the MPT for compat guests. */
-    l2_ro_mpt_mfn = alloc_xen_pagetable_new();
+    l2_ro_mpt_mfn = alloc_xen_pagetable();
     if ( mfn_eq(l2_ro_mpt_mfn, INVALID_MFN) )
         goto nomem;
     compat_idle_pg_table_l2 = map_domain_page_global(l2_ro_mpt_mfn);
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2ac0cdab83..482fbd9d39 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -583,8 +583,8 @@ int vcpu_destroy_pagetables(struct vcpu *);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 /* Allocator functions for Xen pagetables. */
-mfn_t alloc_xen_pagetable_new(void);
-void free_xen_pagetable_new(mfn_t mfn);
+mfn_t alloc_xen_pagetable(void);
+void free_xen_pagetable(mfn_t mfn);
 void *alloc_map_clear_xen_pt(mfn_t *pmfn);
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v);
-- 
2.24.1.AMZN



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-05-29 11:11 ` [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
@ 2020-07-14 10:47   ` Jan Beulich
  2020-07-27  9:09     ` Hongyan Xia
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 10:47 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: Stefano Stabellini, julien, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, xen-devel, Roger Pau Monné

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Rewrite those functions to use the new APIs. Modify its callers to unmap
> the pointer returned. Since alloc_xen_pagetable_new() is almost never
> useful unless accompanied by page clearing and a mapping, introduce a
> helper alloc_map_clear_xen_pt() for this sequence.
> 
> Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
> unmap the page, which requires domain_page.h header in vmap.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two further small adjustments:

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn)
>          free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
>  }
>  
> +void *alloc_map_clear_xen_pt(mfn_t *pmfn)
> +{
> +    mfn_t mfn = alloc_xen_pagetable_new();
> +    void *ret;
> +
> +    if ( mfn_eq(mfn, INVALID_MFN) )
> +        return NULL;
> +
> +    if ( pmfn )
> +        *pmfn = mfn;
> +    ret = map_domain_page(mfn);
> +    clear_page(ret);
> +
> +    return ret;
> +}
> +
>  static DEFINE_SPINLOCK(map_pgdir_lock);
>  
> +/*
> + * For virt_to_xen_lXe() functions, they take a virtual address and return a
> + * pointer to Xen's LX entry. Caller needs to unmap the pointer.
> + */
>  static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)

May I suggest s/virtual/linear/ to at least make the new comment
correct?

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
>  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
>  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
>  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> -#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
> +
> +#define vmap_to_mfn(va) ({                                                  \
> +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
> +        mfn_t mfn_ = l1e_get_mfn(*pl1e_);                                   \
> +        unmap_domain_page(pl1e_);                                           \
> +        mfn_; })

Just like is already the case in domain_page_map_to_mfn() I think
you want to add "BUG_ON(!pl1e)" here to limit the impact of any
problem to DoS (rather than a possible privilege escalation).

Or actually, considering the only case where virt_to_xen_l1e()
would return NULL, returning INVALID_MFN here would likely be
even more robust. There looks to be just a single caller, which
would need adjusting to cope with an error coming back. In fact -
it already ASSERT()s, despite NULL right now never coming back
from vmap_to_page(). I think the loop there would better be

    for ( i = 0; i < pages; i++ )
    {
        struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);

        if ( page )
            page_list_add(page, &pg_list);
        else
            printk_once(...);
    }

Thoughts?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen
  2020-05-29 11:11 ` [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
@ 2020-07-14 11:52   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 11:52 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Page tables allocated in that function should be mapped and unmapped
> now.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings
  2020-05-29 11:11 ` [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
@ 2020-07-14 11:55   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 11:55 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Page tables allocated in that function should be mapped and unmapped
> now.
> 
> Note that pl2e now maybe mapped and unmapped in different iterations, so
> we need to add clean-ups for that.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init
  2020-05-29 11:11 ` [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
@ 2020-07-14 12:02   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 12:02 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> We will soon map and unmap pages in paging_init(). Introduce pl2e so
> that we can use l2_ro_mpt to point to the page table itself.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 07/15] x86_64/mm: switch to new APIs in paging_init
  2020-05-29 11:11 ` [PATCH v7 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
@ 2020-07-14 12:03   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 12:03 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Map and unmap pages instead of relying on the direct map.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
albeit preferably with ...

> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -481,6 +481,7 @@ void __init paging_init(void)
>      l3_pgentry_t *l3_ro_mpt;
>      l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
>      struct page_info *l1_pg;
> +    mfn_t l3_ro_mpt_mfn, l2_ro_mpt_mfn;

... just a single variable named "mfn" here. Afaics the life ranges
allow for this (imo) readability improvement.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table
  2020-05-29 11:11 ` [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
@ 2020-07-14 12:32   ` Jan Beulich
  2020-07-14 12:33   ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 12:32 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Avoid repetitive mapping of l2_ro_mpt by keeping it across loops, and
> only unmap and map it when crossing 1G boundaries.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

I do think, however, that ...

> @@ -438,32 +443,29 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
>  
>              ASSERT(!(l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
>                    _PAGE_PSE));
> -            if ( l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
> -              _PAGE_PRESENT )
> -                l2_ro_mpt = l3e_to_l2e(l3_ro_mpt[l3_table_offset(va)]) +
> -                  l2_table_offset(va);
> -            else
> +            if ( (l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
> +                  _PAGE_PRESENT) && !l2_ro_mpt)
> +                l2_ro_mpt = map_l2t_from_l3e(l3_ro_mpt[l3_table_offset(va)]);
> +            else if ( !l2_ro_mpt )
>              {

... this would be slightly neater as

            if ( l2_ro_mpt )
                /* nothing */;
            else if ( l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
                      _PAGE_PRESENT )
                l2_ro_mpt = map_l2t_from_l3e(l3_ro_mpt[l3_table_offset(va)]);
            else
            {
                ...

My R-b holds if you would change it like this.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table
  2020-05-29 11:11 ` [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
  2020-07-14 12:32   ` Jan Beulich
@ 2020-07-14 12:33   ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 12:33 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Avoid repetitive mapping of l2_ro_mpt by keeping it across loops, and
> only unmap and map it when crossing 1G boundaries.

Oh, one other thing: This reads as if there were such repetitive
mapping operations, and the patch took care of them. How about
starting this with "While doing so, avoid making ..."?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/15] efi: use new page table APIs in copy_mapping
  2020-05-29 11:11 ` [PATCH v7 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
@ 2020-07-14 12:42   ` Jan Beulich
  2020-07-27 12:45     ` Hongyan Xia
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2020-07-14 12:42 UTC (permalink / raw)
  To: Hongyan Xia; +Cc: xen-devel, julien

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> After inspection ARM doesn't have alloc_xen_pagetable so this function
> is x86 only, which means it is safe for us to change.

Well, it sits inside a "#ifndef CONFIG_ARM" section.

> @@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned long mfn, unsigned long end,
>                                                   unsigned long emfn))
>  {
>      unsigned long next;
> +    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
>  
>      for ( ; mfn < end; mfn = next )
>      {
>          l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)];
> -        l3_pgentry_t *l3src, *l3dst;
>          unsigned long va = (unsigned long)mfn_to_virt(mfn);
>  
> +        if ( !((mfn << PAGE_SHIFT) & ((1UL << L4_PAGETABLE_SHIFT) - 1)) )

To be in line with ...

> +        {
> +            UNMAP_DOMAIN_PAGE(l3src);
> +            UNMAP_DOMAIN_PAGE(l3dst);
> +        }
>          next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));

... this, please avoid the left shift of mfn in the if(). Judging from
code further down I also think the zapping of l3src would better be
dependent upon va than upon mfn.

>          if ( !is_valid(mfn, min(next, end)) )
>              continue;
> -        if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> +        if ( !l3dst )
>          {
> -            l3dst = alloc_xen_pagetable();
> -            BUG_ON(!l3dst);
> -            clear_page(l3dst);
> -            efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
> -                l4e_from_paddr(virt_to_maddr(l3dst), __PAGE_HYPERVISOR);
> +            if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
> +            {
> +                mfn_t l3mfn;
> +
> +                l3dst = alloc_map_clear_xen_pt(&l3mfn);
> +                BUG_ON(!l3dst);
> +                efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
> +                    l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
> +            }
> +            else
> +                l3dst = map_l3t_from_l4e(l4e);
>          }
> -        else
> -            l3dst = l4e_to_l3e(l4e);

As for the earlier patch, maybe again neater if you started with

        if ( l3dst )
            /* nothing */;
        else if ...

Would also save a level of indentation as it seems.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 10/15] efi: switch to new APIs in EFI code
  2020-05-29 11:11 ` [PATCH v7 10/15] efi: switch to new APIs in EFI code Hongyan Xia
@ 2020-07-15 13:06   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-15 13:06 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs
  2020-05-29 11:11 ` [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
@ 2020-07-15 13:11   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-15 13:11 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 14/15] x86: switch to use domheap page for page tables
  2020-05-29 11:11 ` [PATCH v7 14/15] x86: switch to use domheap page for page tables Hongyan Xia
@ 2020-07-15 13:16   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-15 13:16 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, Roger Pau Monné, julien, Wei Liu, Andrew Cooper

On 29.05.2020 13:11, Hongyan Xia wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with a sufficiently minor remark:

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4918,10 +4918,11 @@ mfn_t alloc_xen_pagetable_new(void)
>  {
>      if ( system_state != SYS_STATE_early_boot )
>      {
> -        void *ptr = alloc_xenheap_page();
>  
> -        BUG_ON(!hardware_domain && !ptr);
> -        return ptr ? virt_to_mfn(ptr) : INVALID_MFN;
> +        struct page_info *pg = alloc_domheap_page(NULL, 0);
> +
> +        BUG_ON(!hardware_domain && !pg);
> +        return pg ? page_to_mfn(pg) : INVALID_MFN;

pg doesn't even get de-referenced, let alone modified. Hence it
would better be pointer-to-const, despite this possibly feeling a
little odd to some of us given this is a freshly allocated page.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-07-14 10:47   ` Jan Beulich
@ 2020-07-27  9:09     ` Hongyan Xia
  2020-07-27 19:31       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Hongyan Xia @ 2020-07-27  9:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, julien, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, xen-devel, Roger Pau Monné

On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote:
> On 29.05.2020 13:11, Hongyan Xia wrote:
> > From: Wei Liu <wei.liu2@citrix.com>
> > 
> > Rewrite those functions to use the new APIs. Modify its callers to
> > unmap
> > the pointer returned. Since alloc_xen_pagetable_new() is almost
> > never
> > useful unless accompanied by page clearing and a mapping, introduce
> > a
> > helper alloc_map_clear_xen_pt() for this sequence.
> > 
> > Note that the change of virt_to_xen_l1e() also requires
> > vmap_to_mfn() to
> > unmap the page, which requires domain_page.h header in vmap.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with two further small adjustments:
> 
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -4948,8 +4948,28 @@ void free_xen_pagetable_new(mfn_t mfn)
> >          free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
> >  }
> >  
> > +void *alloc_map_clear_xen_pt(mfn_t *pmfn)
> > +{
> > +    mfn_t mfn = alloc_xen_pagetable_new();
> > +    void *ret;
> > +
> > +    if ( mfn_eq(mfn, INVALID_MFN) )
> > +        return NULL;
> > +
> > +    if ( pmfn )
> > +        *pmfn = mfn;
> > +    ret = map_domain_page(mfn);
> > +    clear_page(ret);
> > +
> > +    return ret;
> > +}
> > +
> >  static DEFINE_SPINLOCK(map_pgdir_lock);
> >  
> > +/*
> > + * For virt_to_xen_lXe() functions, they take a virtual address
> > and return a
> > + * pointer to Xen's LX entry. Caller needs to unmap the pointer.
> > + */
> >  static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
> 
> May I suggest s/virtual/linear/ to at least make the new comment
> correct?
> 
> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
> >  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
> >  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
> >  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> > -#define
> > vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned
> > long)(va))))
> > +
> > +#define vmap_to_mfn(va)
> > ({                                                  \
> > +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
> > long)(va));   \
> > +        mfn_t mfn_ =
> > l1e_get_mfn(*pl1e_);                                   \
> > +        unmap_domain_page(pl1e_);                                 
> >           \
> > +        mfn_; })
> 
> Just like is already the case in domain_page_map_to_mfn() I think
> you want to add "BUG_ON(!pl1e)" here to limit the impact of any
> problem to DoS (rather than a possible privilege escalation).
> 
> Or actually, considering the only case where virt_to_xen_l1e()
> would return NULL, returning INVALID_MFN here would likely be
> even more robust. There looks to be just a single caller, which
> would need adjusting to cope with an error coming back. In fact -
> it already ASSERT()s, despite NULL right now never coming back
> from vmap_to_page(). I think the loop there would better be
> 
>     for ( i = 0; i < pages; i++ )
>     {
>         struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);
> 
>         if ( page )
>             page_list_add(page, &pg_list);
>         else
>             printk_once(...);
>     }
> 
> Thoughts?

To be honest, I think the current implementation of vmap_to_mfn() is
just incorrect. There is simply no guarantee that a vmap is mapped with
small pages, so IMO we just cannot do virt_to_xen_x1e() here. The
correct way is to have a generic page table walking function which
walks from the base and can stop at any level, and properly return code
to indicate level or any error.

I am inclined to BUG_ON() here, and upstream a proper fix later to
vmap_to_mfn() as an individual patch.

Am I missing anything here?

Hongyan



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/15] efi: use new page table APIs in copy_mapping
  2020-07-14 12:42   ` Jan Beulich
@ 2020-07-27 12:45     ` Hongyan Xia
  2020-07-27 13:45       ` Hongyan Xia
  2020-07-27 19:33       ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-07-27 12:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, julien

On Tue, 2020-07-14 at 14:42 +0200, Jan Beulich wrote:
> On 29.05.2020 13:11, Hongyan Xia wrote:
> > From: Wei Liu <wei.liu2@citrix.com>
> > 
> > After inspection ARM doesn't have alloc_xen_pagetable so this
> > function
> > is x86 only, which means it is safe for us to change.
> 
> Well, it sits inside a "#ifndef CONFIG_ARM" section.
> 
> > @@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned
> > long mfn, unsigned long end,
> >                                                   unsigned long
> > emfn))
> >  {
> >      unsigned long next;
> > +    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
> >  
> >      for ( ; mfn < end; mfn = next )
> >      {
> >          l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn <<
> > PAGE_SHIFT)];
> > -        l3_pgentry_t *l3src, *l3dst;
> >          unsigned long va = (unsigned long)mfn_to_virt(mfn);
> >  
> > +        if ( !((mfn << PAGE_SHIFT) & ((1UL << L4_PAGETABLE_SHIFT)
> > - 1)) )
> 
> To be in line with ...
> 
> > +        {
> > +            UNMAP_DOMAIN_PAGE(l3src);
> > +            UNMAP_DOMAIN_PAGE(l3dst);
> > +        }
> >          next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));
> 
> ... this, please avoid the left shift of mfn in the if(). Judgingfrom

What do you mean by "in line" here? It does not look to me that "next
=" can be easily squashed into the if() condition.

Hongyan



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/15] efi: use new page table APIs in copy_mapping
  2020-07-27 12:45     ` Hongyan Xia
@ 2020-07-27 13:45       ` Hongyan Xia
  2020-07-27 19:33       ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Hongyan Xia @ 2020-07-27 13:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, julien

On Mon, 2020-07-27 at 13:45 +0100, Hongyan Xia wrote:
> On Tue, 2020-07-14 at 14:42 +0200, Jan Beulich wrote:
> > On 29.05.2020 13:11, Hongyan Xia wrote:
> > > From: Wei Liu <wei.liu2@citrix.com>
> > > 
> > > After inspection ARM doesn't have alloc_xen_pagetable so this
> > > function
> > > is x86 only, which means it is safe for us to change.
> > 
> > Well, it sits inside a "#ifndef CONFIG_ARM" section.
> > 
> > > @@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned
> > > long mfn, unsigned long end,
> > >                                                   unsigned long
> > > emfn))
> > >  {
> > >      unsigned long next;
> > > +    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
> > >  
> > >      for ( ; mfn < end; mfn = next )
> > >      {
> > >          l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn <<
> > > PAGE_SHIFT)];
> > > -        l3_pgentry_t *l3src, *l3dst;
> > >          unsigned long va = (unsigned long)mfn_to_virt(mfn);
> > >  
> > > +        if ( !((mfn << PAGE_SHIFT) & ((1UL <<
> > > L4_PAGETABLE_SHIFT)
> > > - 1)) )
> > 
> > To be in line with ...
> > 
> > > +        {
> > > +            UNMAP_DOMAIN_PAGE(l3src);
> > > +            UNMAP_DOMAIN_PAGE(l3dst);
> > > +        }
> > >          next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));
> > 
> > ... this, please avoid the left shift of mfn in the if().
> > Judgingfrom
> 
> What do you mean by "in line" here? It does not look to me that "next
> =" can be easily squashed into the if() condition.

Sorry, never mind. "in line" != "inline".

Hongyan



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-07-27  9:09     ` Hongyan Xia
@ 2020-07-27 19:31       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-27 19:31 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: Stefano Stabellini, julien, Wei Liu, Andrew Cooper, Ian Jackson,
	George Dunlap, xen-devel, Roger Pau Monné

On 27.07.2020 11:09, Hongyan Xia wrote:
> On Tue, 2020-07-14 at 12:47 +0200, Jan Beulich wrote:
>> On 29.05.2020 13:11, Hongyan Xia wrote:
>>> --- a/xen/include/asm-x86/page.h
>>> +++ b/xen/include/asm-x86/page.h
>>> @@ -291,7 +291,13 @@ void copy_page_sse2(void *, const void *);
>>>   #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
>>>   #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
>>>   #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
>>> -#define
>>> vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned
>>> long)(va))))
>>> +
>>> +#define vmap_to_mfn(va)
>>> ({                                                  \
>>> +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
>>> long)(va));   \
>>> +        mfn_t mfn_ =
>>> l1e_get_mfn(*pl1e_);                                   \
>>> +        unmap_domain_page(pl1e_);
>>>            \
>>> +        mfn_; })
>>
>> Just like is already the case in domain_page_map_to_mfn() I think
>> you want to add "BUG_ON(!pl1e)" here to limit the impact of any
>> problem to DoS (rather than a possible privilege escalation).
>>
>> Or actually, considering the only case where virt_to_xen_l1e()
>> would return NULL, returning INVALID_MFN here would likely be
>> even more robust. There looks to be just a single caller, which
>> would need adjusting to cope with an error coming back. In fact -
>> it already ASSERT()s, despite NULL right now never coming back
>> from vmap_to_page(). I think the loop there would better be
>>
>>      for ( i = 0; i < pages; i++ )
>>      {
>>          struct page_info *page = vmap_to_page(va + i * PAGE_SIZE);
>>
>>          if ( page )
>>              page_list_add(page, &pg_list);
>>          else
>>              printk_once(...);
>>      }
>>
>> Thoughts?
> 
> To be honest, I think the current implementation of vmap_to_mfn() is
> just incorrect. There is simply no guarantee that a vmap is mapped with
> small pages, so IMO we just cannot do virt_to_xen_x1e() here. The
> correct way is to have a generic page table walking function which
> walks from the base and can stop at any level, and properly return code
> to indicate level or any error.
> 
> I am inclined to BUG_ON() here, and upstream a proper fix later to
> vmap_to_mfn() as an individual patch.

Well, yes, in principle large pages can result from e.g. vmalloc()ing
a large enough area. However, rather than thinking of a generic
walking function as a solution, how about the simple one for the
immediate needs: Add MAP_SMALL_PAGES?

Also, as a general remark: When you disagree with review feedback, I
think it would be quite reasonable to wait with sending the next
version until the disagreement gets resolved, unless this is taking
unduly long delays.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 09/15] efi: use new page table APIs in copy_mapping
  2020-07-27 12:45     ` Hongyan Xia
  2020-07-27 13:45       ` Hongyan Xia
@ 2020-07-27 19:33       ` Jan Beulich
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2020-07-27 19:33 UTC (permalink / raw)
  To: Hongyan Xia; +Cc: xen-devel, julien

On 27.07.2020 14:45, Hongyan Xia wrote:
> On Tue, 2020-07-14 at 14:42 +0200, Jan Beulich wrote:
>> On 29.05.2020 13:11, Hongyan Xia wrote:
>>> @@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned
>>> long mfn, unsigned long end,
>>>                                                    unsigned long
>>> emfn))
>>>   {
>>>       unsigned long next;
>>> +    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
>>>   
>>>       for ( ; mfn < end; mfn = next )
>>>       {
>>>           l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn <<
>>> PAGE_SHIFT)];
>>> -        l3_pgentry_t *l3src, *l3dst;
>>>           unsigned long va = (unsigned long)mfn_to_virt(mfn);
>>>   
>>> +        if ( !((mfn << PAGE_SHIFT) & ((1UL << L4_PAGETABLE_SHIFT)
>>> - 1)) )
>>
>> To be in line with ...
>>
>>> +        {
>>> +            UNMAP_DOMAIN_PAGE(l3src);
>>> +            UNMAP_DOMAIN_PAGE(l3dst);
>>> +        }
>>>           next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));
>>
>> ... this, please avoid the left shift of mfn in the if(). Judgingfrom
> 
> What do you mean by "in line" here? It does not look to me that "next
> =" can be easily squashed into the if() condition.

I'm not thinking about squashing anything into an if(). I've talked
about avoiding the left shift of mfn, as this last quoted line does
(by instead subtracting PAGE_SHIFT from the left-shift count.

Jan

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-07-27 19:34 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-29 11:11 [PATCH v7 00/15] switch to domheap for Xen page tables Hongyan Xia
2020-05-29 11:11 ` [PATCH v7 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
2020-05-29 11:11 ` [PATCH v7 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
2020-05-29 11:11 ` [PATCH v7 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
2020-07-14 10:47   ` Jan Beulich
2020-07-27  9:09     ` Hongyan Xia
2020-07-27 19:31       ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
2020-07-14 11:52   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
2020-07-14 11:55   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
2020-07-14 12:02   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
2020-07-14 12:03   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
2020-07-14 12:32   ` Jan Beulich
2020-07-14 12:33   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
2020-07-14 12:42   ` Jan Beulich
2020-07-27 12:45     ` Hongyan Xia
2020-07-27 13:45       ` Hongyan Xia
2020-07-27 19:33       ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 10/15] efi: switch to new APIs in EFI code Hongyan Xia
2020-07-15 13:06   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
2020-05-29 11:11 ` [PATCH v7 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
2020-07-15 13:11   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 13/15] x86/mm: drop old page table APIs Hongyan Xia
2020-05-29 11:11 ` [PATCH v7 14/15] x86: switch to use domheap page for page tables Hongyan Xia
2020-07-15 13:16   ` Jan Beulich
2020-05-29 11:11 ` [PATCH v7 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).