xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/15] switch to domheap for Xen page tables
@ 2020-07-27 14:21 Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, jgrall,
	Ian Jackson, George Dunlap, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

This series rewrites all the remaining functions and finally makes the
switch from xenheap to domheap for Xen page tables, so that they no
longer need to rely on the direct map, which is a big step towards
removing the direct map.

This series depends on the following mini-series:
https://lists.xenproject.org/archives/html/xen-devel/2020-04/msg00730.html

---
Changed in v8:
- address comments in v7.
- rebase

Changed in v7:
- rebase and cleanup.
- address comments in v6.
- add alloc_map_clear_xen_pt() helper to simplify the patches in this
  series.

Changed in v6:
- drop the patches that have already been merged.
- rebase and cleanup.
- rewrite map_pages_to_xen() and modify_xen_mappings() in a way that
  does not require an end_of_loop goto label.

Hongyan Xia (2):
  x86/mm: drop old page table APIs
  x86: switch to use domheap page for page tables

Wei Liu (13):
  x86/mm: map_pages_to_xen would better have one exit path
  x86/mm: make sure there is one exit path for modify_xen_mappings
  x86/mm: rewrite virt_to_xen_l*e
  x86/mm: switch to new APIs in map_pages_to_xen
  x86/mm: switch to new APIs in modify_xen_mappings
  x86_64/mm: introduce pl2e in paging_init
  x86_64/mm: switch to new APIs in paging_init
  x86_64/mm: switch to new APIs in setup_m2p_table
  efi: use new page table APIs in copy_mapping
  efi: switch to new APIs in EFI code
  x86/smpboot: add exit path for clone_mapping()
  x86/smpboot: switch clone_mapping() to new APIs
  x86/mm: drop _new suffix for page table APIs

 xen/arch/x86/domain_page.c |  11 +-
 xen/arch/x86/efi/runtime.h |  13 ++-
 xen/arch/x86/mm.c          | 272 ++++++++++++++++++++++++++++-----------------
 xen/arch/x86/setup.c       |   4 +-
 xen/arch/x86/smpboot.c     |  70 +++++++-----
 xen/arch/x86/x86_64/mm.c   |  81 ++++++++------
 xen/common/efi/boot.c      |  83 +++++++++-----
 xen/common/efi/efi.h       |   3 +-
 xen/common/efi/runtime.c   |   8 +-
 xen/common/vmap.c          |   1 +
 xen/include/asm-x86/mm.h   |   7 +-
 xen/include/asm-x86/page.h |  15 ++-
 12 files changed, 353 insertions(+), 215 deletions(-)

-- 
2.16.6



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v8 01/15] x86/mm: map_pages_to_xen would better have one exit path
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon rewrite the function to handle dynamically mapping and
unmapping of page tables. Since dynamic mappings may map and unmap pages
in different iterations of the while loop, we need to lift pl3e out of
the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since rc never gets changed to anything else.
- reword commit message.
---
 xen/arch/x86/mm.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 82bc676553..0ade9b3917 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5085,9 +5085,11 @@ int map_pages_to_xen(
     unsigned int flags)
 {
     bool locking = system_state > SYS_STATE_boot;
+    l3_pgentry_t *pl3e, ol3e;
     l2_pgentry_t *pl2e, ol2e;
     l1_pgentry_t *pl1e, ol1e;
     unsigned int  i;
+    int rc = -ENOMEM;
 
 #define flush_flags(oldf) do {                 \
     unsigned int o_ = (oldf);                  \
@@ -5105,10 +5107,11 @@ int map_pages_to_xen(
 
     while ( nr_mfns != 0 )
     {
-        l3_pgentry_t ol3e, *pl3e = virt_to_xen_l3e(virt);
+        pl3e = virt_to_xen_l3e(virt);
 
         if ( !pl3e )
-            return -ENOMEM;
+            goto out;
+
         ol3e = *pl3e;
 
         if ( cpu_has_page1gb &&
@@ -5198,7 +5201,7 @@ int map_pages_to_xen(
 
             l2t = alloc_xen_pagetable();
             if ( l2t == NULL )
-                return -ENOMEM;
+                goto out;
 
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
@@ -5227,7 +5230,7 @@ int map_pages_to_xen(
 
         pl2e = virt_to_xen_l2e(virt);
         if ( !pl2e )
-            return -ENOMEM;
+            goto out;
 
         if ( ((((virt >> PAGE_SHIFT) | mfn_x(mfn)) &
                ((1u << PAGETABLE_ORDER) - 1)) == 0) &&
@@ -5271,7 +5274,7 @@ int map_pages_to_xen(
             {
                 pl1e = virt_to_xen_l1e(virt);
                 if ( pl1e == NULL )
-                    return -ENOMEM;
+                    goto out;
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
@@ -5299,7 +5302,7 @@ int map_pages_to_xen(
 
                 l1t = alloc_xen_pagetable();
                 if ( l1t == NULL )
-                    return -ENOMEM;
+                    goto out;
 
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
@@ -5445,7 +5448,10 @@ int map_pages_to_xen(
 
 #undef flush_flags
 
-    return 0;
+    rc = 0;
+
+ out:
+    return rc;
 }
 
 int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon need to handle dynamically mapping / unmapping page
tables in the said function. Since dynamic mappings may map and unmap
pl3e in different iterations, lift pl3e out of the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since it never gets changed to anything else.
---
 xen/arch/x86/mm.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 0ade9b3917..7a11d022c9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5474,10 +5474,12 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
+    l3_pgentry_t *pl3e;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
     unsigned int  i;
     unsigned long v = s;
+    int rc = -ENOMEM;
 
     /* Set of valid PTE bits which may be altered. */
 #define FLAGS_MASK (_PAGE_NX|_PAGE_DIRTY|_PAGE_ACCESSED|_PAGE_RW|_PAGE_PRESENT)
@@ -5488,7 +5490,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
     while ( v < e )
     {
-        l3_pgentry_t *pl3e = virt_to_xen_l3e(v);
+        pl3e = virt_to_xen_l3e(v);
 
         if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
@@ -5521,7 +5523,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             /* PAGE1GB: shatter the superpage and fall through. */
             l2t = alloc_xen_pagetable();
             if ( !l2t )
-                return -ENOMEM;
+                goto out;
+
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
@@ -5578,7 +5581,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 /* PSE: shatter the superpage and try again. */
                 l1t = alloc_xen_pagetable();
                 if ( !l1t )
-                    return -ENOMEM;
+                    goto out;
+
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
@@ -5711,7 +5715,10 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     flush_area(NULL, FLUSH_TLB_GLOBAL);
 
 #undef FLAGS_MASK
-    return 0;
+    rc = 0;
+
+ out:
+    return rc;
 }
 
 #undef flush_area
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-08-07 14:05   ` Jan Beulich
  2020-12-07 15:28   ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, jgrall,
	Ian Jackson, George Dunlap, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Rewrite those functions to use the new APIs. Modify its callers to unmap
the pointer returned. Since alloc_xen_pagetable_new() is almost never
useful unless accompanied by page clearing and a mapping, introduce a
helper alloc_map_clear_xen_pt() for this sequence.

Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
unmap the page, which requires domain_page.h header in vmap.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v8:
- s/virtual address/linear address/.
- BUG_ON() on NULL return in vmap_to_mfn().

Changed in v7:
- remove a comment.
- use l1e_get_mfn() instead of converting things back and forth.
- add alloc_map_clear_xen_pt().
- unmap before the next mapping to reduce mapcache pressure.
- use normal unmap calls instead of the macro in error paths because
  unmap can handle NULL now.
---
 xen/arch/x86/domain_page.c | 11 ++++--
 xen/arch/x86/mm.c          | 96 +++++++++++++++++++++++++++++++++-------------
 xen/common/vmap.c          |  1 +
 xen/include/asm-x86/mm.h   |  1 +
 xen/include/asm-x86/page.h | 10 ++++-
 5 files changed, 88 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index b03728e18e..dc8627c1b5 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -333,21 +333,24 @@ void unmap_domain_page_global(const void *ptr)
 mfn_t domain_page_map_to_mfn(const void *ptr)
 {
     unsigned long va = (unsigned long)ptr;
-    const l1_pgentry_t *pl1e;
+    l1_pgentry_t l1e;
 
     if ( va >= DIRECTMAP_VIRT_START )
         return _mfn(virt_to_mfn(ptr));
 
     if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
     {
-        pl1e = virt_to_xen_l1e(va);
+        const l1_pgentry_t *pl1e = virt_to_xen_l1e(va);
+
         BUG_ON(!pl1e);
+        l1e = *pl1e;
+        unmap_domain_page(pl1e);
     }
     else
     {
         ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
-        pl1e = &__linear_l1_table[l1_linear_offset(va)];
+        l1e = __linear_l1_table[l1_linear_offset(va)];
     }
 
-    return l1e_get_mfn(*pl1e);
+    return l1e_get_mfn(l1e);
 }
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 7a11d022c9..fd416c0282 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4965,8 +4965,28 @@ void free_xen_pagetable_new(mfn_t mfn)
         free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
 }
 
+void *alloc_map_clear_xen_pt(mfn_t *pmfn)
+{
+    mfn_t mfn = alloc_xen_pagetable_new();
+    void *ret;
+
+    if ( mfn_eq(mfn, INVALID_MFN) )
+        return NULL;
+
+    if ( pmfn )
+        *pmfn = mfn;
+    ret = map_domain_page(mfn);
+    clear_page(ret);
+
+    return ret;
+}
+
 static DEFINE_SPINLOCK(map_pgdir_lock);
 
+/*
+ * For virt_to_xen_lXe() functions, they take a linear address and return a
+ * pointer to Xen's LX entry. Caller needs to unmap the pointer.
+ */
 static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
 {
     l4_pgentry_t *pl4e;
@@ -4975,33 +4995,33 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
     if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l3_pgentry_t *l3t = alloc_xen_pagetable();
+        mfn_t l3mfn;
+        l3_pgentry_t *l3t = alloc_map_clear_xen_pt(&l3mfn);
 
         if ( !l3t )
             return NULL;
-        clear_page(l3t);
+        UNMAP_DOMAIN_PAGE(l3t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
-            l4_pgentry_t l4e = l4e_from_paddr(__pa(l3t), __PAGE_HYPERVISOR);
+            l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
 
             l4e_write(pl4e, l4e);
             efi_update_l4_pgtable(l4_table_offset(v), l4e);
-            l3t = NULL;
+            l3mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l3t )
-            free_xen_pagetable(l3t);
+        free_xen_pagetable_new(l3mfn);
     }
 
-    return l4e_to_l3e(*pl4e) + l3_table_offset(v);
+    return map_l3t_from_l4e(*pl4e) + l3_table_offset(v);
 }
 
 static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
 {
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e, l3e;
 
     pl3e = virt_to_xen_l3e(v);
     if ( !pl3e )
@@ -5010,31 +5030,37 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l2_pgentry_t *l2t = alloc_xen_pagetable();
+        mfn_t l2mfn;
+        l2_pgentry_t *l2t = alloc_map_clear_xen_pt(&l2mfn);
 
         if ( !l2t )
+        {
+            unmap_domain_page(pl3e);
             return NULL;
-        clear_page(l2t);
+        }
+        UNMAP_DOMAIN_PAGE(l2t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
-            l3e_write(pl3e, l3e_from_paddr(__pa(l2t), __PAGE_HYPERVISOR));
-            l2t = NULL;
+            l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+            l2mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l2t )
-            free_xen_pagetable(l2t);
+        free_xen_pagetable_new(l2mfn);
     }
 
     BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
-    return l3e_to_l2e(*pl3e) + l2_table_offset(v);
+    l3e = *pl3e;
+    unmap_domain_page(pl3e);
+
+    return map_l2t_from_l3e(l3e) + l2_table_offset(v);
 }
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
 {
-    l2_pgentry_t *pl2e;
+    l2_pgentry_t *pl2e, l2e;
 
     pl2e = virt_to_xen_l2e(v);
     if ( !pl2e )
@@ -5043,26 +5069,32 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
         bool locking = system_state > SYS_STATE_boot;
-        l1_pgentry_t *l1t = alloc_xen_pagetable();
+        mfn_t l1mfn;
+        l1_pgentry_t *l1t = alloc_map_clear_xen_pt(&l1mfn);
 
         if ( !l1t )
+        {
+            unmap_domain_page(pl2e);
             return NULL;
-        clear_page(l1t);
+        }
+        UNMAP_DOMAIN_PAGE(l1t);
         if ( locking )
             spin_lock(&map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
-            l2e_write(pl2e, l2e_from_paddr(__pa(l1t), __PAGE_HYPERVISOR));
-            l1t = NULL;
+            l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
+            l1mfn = INVALID_MFN;
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        if ( l1t )
-            free_xen_pagetable(l1t);
+        free_xen_pagetable_new(l1mfn);
     }
 
     BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
-    return l2e_to_l1e(*pl2e) + l1_table_offset(v);
+    l2e = *pl2e;
+    unmap_domain_page(pl2e);
+
+    return map_l1t_from_l2e(l2e) + l1_table_offset(v);
 }
 
 /* Convert to from superpage-mapping flags for map_pages_to_xen(). */
@@ -5085,8 +5117,8 @@ int map_pages_to_xen(
     unsigned int flags)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e, ol3e;
-    l2_pgentry_t *pl2e, ol2e;
+    l3_pgentry_t *pl3e = NULL, ol3e;
+    l2_pgentry_t *pl2e = NULL, ol2e;
     l1_pgentry_t *pl1e, ol1e;
     unsigned int  i;
     int rc = -ENOMEM;
@@ -5107,6 +5139,10 @@ int map_pages_to_xen(
 
     while ( nr_mfns != 0 )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+        UNMAP_DOMAIN_PAGE(pl2e);
+
         pl3e = virt_to_xen_l3e(virt);
 
         if ( !pl3e )
@@ -5275,6 +5311,8 @@ int map_pages_to_xen(
                 pl1e = virt_to_xen_l1e(virt);
                 if ( pl1e == NULL )
                     goto out;
+
+                UNMAP_DOMAIN_PAGE(pl1e);
             }
             else if ( l2e_get_flags(*pl2e) & _PAGE_PSE )
             {
@@ -5451,6 +5489,8 @@ int map_pages_to_xen(
     rc = 0;
 
  out:
+    unmap_domain_page(pl2e);
+    unmap_domain_page(pl3e);
     return rc;
 }
 
@@ -5474,7 +5514,7 @@ int populate_pt_range(unsigned long virt, unsigned long nr_mfns)
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
-    l3_pgentry_t *pl3e;
+    l3_pgentry_t *pl3e = NULL;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
     unsigned int  i;
@@ -5490,6 +5530,9 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
     while ( v < e )
     {
+        /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl3e);
+
         pl3e = virt_to_xen_l3e(v);
 
         if ( !pl3e || !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
@@ -5718,6 +5761,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     rc = 0;
 
  out:
+    unmap_domain_page(pl3e);
     return rc;
 }
 
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index faebc1ddf1..9964ab2096 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -1,6 +1,7 @@
 #ifdef VMAP_VIRT_START
 #include <xen/bitmap.h>
 #include <xen/cache.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/mm.h>
 #include <xen/pfn.h>
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 7e74996053..5b76308948 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -586,6 +586,7 @@ void *alloc_xen_pagetable(void);
 void free_xen_pagetable(void *v);
 mfn_t alloc_xen_pagetable_new(void);
 void free_xen_pagetable_new(mfn_t mfn);
+void *alloc_map_clear_xen_pt(mfn_t *pmfn);
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v);
 
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index f632affaef..608a048c28 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -291,7 +291,15 @@ void copy_page_sse2(void *, const void *);
 #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
 #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
 #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
-#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
+
+#define vmap_to_mfn(va) ({                                                  \
+        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
+        mfn_t mfn_;                                                         \
+        BUG_ON(!pl1e_);                                                     \
+        mfn_ = l1e_get_mfn(*pl1e_);                                         \
+        unmap_domain_page(pl1e_);                                           \
+        mfn_; })
+
 #define vmap_to_page(va)    mfn_to_page(vmap_to_mfn(va))
 
 #endif /* !defined(__ASSEMBLY__) */
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 04/15] x86/mm: switch to new APIs in map_pages_to_xen
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (2 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Page tables allocated in that function should be mapped and unmapped
now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c | 60 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd416c0282..edcf164742 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5171,7 +5171,7 @@ int map_pages_to_xen(
                 }
                 else
                 {
-                    l2_pgentry_t *l2t = l3e_to_l2e(ol3e);
+                    l2_pgentry_t *l2t = map_l2t_from_l3e(ol3e);
 
                     for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                     {
@@ -5183,10 +5183,11 @@ int map_pages_to_xen(
                         else
                         {
                             unsigned int j;
-                            const l1_pgentry_t *l1t = l2e_to_l1e(ol2e);
+                            const l1_pgentry_t *l1t = map_l1t_from_l2e(ol2e);
 
                             for ( j = 0; j < L1_PAGETABLE_ENTRIES; j++ )
                                 flush_flags(l1e_get_flags(l1t[j]));
+                            unmap_domain_page(l1t);
                         }
                     }
                     flush_area(virt, flush_flags);
@@ -5195,9 +5196,10 @@ int map_pages_to_xen(
                         ol2e = l2t[i];
                         if ( (l2e_get_flags(ol2e) & _PAGE_PRESENT) &&
                              !(l2e_get_flags(ol2e) & _PAGE_PSE) )
-                            free_xen_pagetable(l2e_to_l1e(ol2e));
+                            free_xen_pagetable_new(l2e_get_mfn(ol2e));
                     }
-                    free_xen_pagetable(l2t);
+                    unmap_domain_page(l2t);
+                    free_xen_pagetable_new(l3e_get_mfn(ol3e));
                 }
             }
 
@@ -5214,6 +5216,7 @@ int map_pages_to_xen(
             unsigned int flush_flags =
                 FLUSH_TLB | FLUSH_ORDER(2 * PAGETABLE_ORDER);
             l2_pgentry_t *l2t;
+            mfn_t l2mfn;
 
             /* Skip this PTE if there is no change. */
             if ( ((l3e_get_pfn(ol3e) & ~(L2_PAGETABLE_ENTRIES *
@@ -5235,15 +5238,17 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2t = alloc_xen_pagetable();
-            if ( l2t == NULL )
+            l2mfn = alloc_xen_pagetable_new();
+            if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(ol3e) +
                                        (i << PAGETABLE_ORDER),
                                        l3e_get_flags(ol3e)));
+            UNMAP_DOMAIN_PAGE(l2t);
 
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
@@ -5253,15 +5258,15 @@ int map_pages_to_xen(
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
-                                                    __PAGE_HYPERVISOR));
-                l2t = NULL;
+                l3e_write_atomic(pl3e,
+                                 l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+                l2mfn = INVALID_MFN;
             }
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
             flush_area(virt, flush_flags);
-            if ( l2t )
-                free_xen_pagetable(l2t);
+
+            free_xen_pagetable_new(l2mfn);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5289,12 +5294,13 @@ int map_pages_to_xen(
                 }
                 else
                 {
-                    l1_pgentry_t *l1t = l2e_to_l1e(ol2e);
+                    l1_pgentry_t *l1t = map_l1t_from_l2e(ol2e);
 
                     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                         flush_flags(l1e_get_flags(l1t[i]));
                     flush_area(virt, flush_flags);
-                    free_xen_pagetable(l1t);
+                    unmap_domain_page(l1t);
+                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
                 }
             }
 
@@ -5319,6 +5325,7 @@ int map_pages_to_xen(
                 unsigned int flush_flags =
                     FLUSH_TLB | FLUSH_ORDER(PAGETABLE_ORDER);
                 l1_pgentry_t *l1t;
+                mfn_t l1mfn;
 
                 /* Skip this PTE if there is no change. */
                 if ( (((l2e_get_pfn(*pl2e) & ~(L1_PAGETABLE_ENTRIES - 1)) +
@@ -5338,14 +5345,16 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1t = alloc_xen_pagetable();
-                if ( l1t == NULL )
+                l1mfn = alloc_xen_pagetable_new();
+                if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
+                l1t = map_domain_page(l1mfn);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
                                            lNf_to_l1f(l2e_get_flags(*pl2e))));
+                UNMAP_DOMAIN_PAGE(l1t);
 
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
@@ -5355,20 +5364,21 @@ int map_pages_to_xen(
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(l1t),
+                    l2e_write_atomic(pl2e, l2e_from_mfn(l1mfn,
                                                         __PAGE_HYPERVISOR));
-                    l1t = NULL;
+                    l1mfn = INVALID_MFN;
                 }
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(virt, flush_flags);
-                if ( l1t )
-                    free_xen_pagetable(l1t);
+
+                free_xen_pagetable_new(l1mfn);
             }
 
-            pl1e  = l2e_to_l1e(*pl2e) + l1_table_offset(virt);
+            pl1e  = map_l1t_from_l2e(*pl2e) + l1_table_offset(virt);
             ol1e  = *pl1e;
             l1e_write_atomic(pl1e, l1e_from_mfn(mfn, flags));
+            UNMAP_DOMAIN_PAGE(pl1e);
             if ( (l1e_get_flags(ol1e) & _PAGE_PRESENT) )
             {
                 unsigned int flush_flags = FLUSH_TLB | FLUSH_ORDER(0);
@@ -5412,12 +5422,13 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1t = l2e_to_l1e(ol2e);
+                l1t = map_l1t_from_l2e(ol2e);
                 base_mfn = l1e_get_pfn(l1t[0]) & ~(L1_PAGETABLE_ENTRIES - 1);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     if ( (l1e_get_pfn(l1t[i]) != (base_mfn + i)) ||
                          (l1e_get_flags(l1t[i]) != flags) )
                         break;
+                UNMAP_DOMAIN_PAGE(l1t);
                 if ( i == L1_PAGETABLE_ENTRIES )
                 {
                     l2e_write_atomic(pl2e, l2e_from_pfn(base_mfn,
@@ -5427,7 +5438,7 @@ int map_pages_to_xen(
                     flush_area(virt - PAGE_SIZE,
                                FLUSH_TLB_GLOBAL |
                                FLUSH_ORDER(PAGETABLE_ORDER));
-                    free_xen_pagetable(l2e_to_l1e(ol2e));
+                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
                 }
                 else if ( locking )
                     spin_unlock(&map_pgdir_lock);
@@ -5460,7 +5471,7 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2t = l3e_to_l2e(ol3e);
+            l2t = map_l2t_from_l3e(ol3e);
             base_mfn = l2e_get_pfn(l2t[0]) & ~(L2_PAGETABLE_ENTRIES *
                                               L1_PAGETABLE_ENTRIES - 1);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
@@ -5468,6 +5479,7 @@ int map_pages_to_xen(
                       (base_mfn + (i << PAGETABLE_ORDER))) ||
                      (l2e_get_flags(l2t[i]) != l1f_to_lNf(flags)) )
                     break;
+            UNMAP_DOMAIN_PAGE(l2t);
             if ( i == L2_PAGETABLE_ENTRIES )
             {
                 l3e_write_atomic(pl3e, l3e_from_pfn(base_mfn,
@@ -5477,7 +5489,7 @@ int map_pages_to_xen(
                 flush_area(virt - PAGE_SIZE,
                            FLUSH_TLB_GLOBAL |
                            FLUSH_ORDER(2*PAGETABLE_ORDER));
-                free_xen_pagetable(l3e_to_l2e(ol3e));
+                free_xen_pagetable_new(l3e_get_mfn(ol3e));
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 05/15] x86/mm: switch to new APIs in modify_xen_mappings
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (3 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Page tables allocated in that function should be mapped and unmapped
now.

Note that pl2e now maybe mapped and unmapped in different iterations, so
we need to add clean-ups for that.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v7:
- use normal unmap in the error path.
---
 xen/arch/x86/mm.c | 57 +++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index edcf164742..199940a345 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5527,7 +5527,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 {
     bool locking = system_state > SYS_STATE_boot;
     l3_pgentry_t *pl3e = NULL;
-    l2_pgentry_t *pl2e;
+    l2_pgentry_t *pl2e = NULL;
     l1_pgentry_t *pl1e;
     unsigned int  i;
     unsigned long v = s;
@@ -5543,6 +5543,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     while ( v < e )
     {
         /* Clean up mappings mapped in the previous iteration. */
+        UNMAP_DOMAIN_PAGE(pl2e);
         UNMAP_DOMAIN_PAGE(pl3e);
 
         pl3e = virt_to_xen_l3e(v);
@@ -5560,6 +5561,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( l3e_get_flags(*pl3e) & _PAGE_PSE )
         {
             l2_pgentry_t *l2t;
+            mfn_t l2mfn;
 
             if ( l2_table_offset(v) == 0 &&
                  l1_table_offset(v) == 0 &&
@@ -5576,35 +5578,38 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            l2t = alloc_xen_pagetable();
-            if ( !l2t )
+            l2mfn = alloc_xen_pagetable_new();
+            if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 l2e_write(l2t + i,
                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
                                        (i << PAGETABLE_ORDER),
                                        l3e_get_flags(*pl3e)));
+            UNMAP_DOMAIN_PAGE(l2t);
+
             if ( locking )
                 spin_lock(&map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
-                l3e_write_atomic(pl3e, l3e_from_mfn(virt_to_mfn(l2t),
-                                                    __PAGE_HYPERVISOR));
-                l2t = NULL;
+                l3e_write_atomic(pl3e,
+                                 l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+                l2mfn = INVALID_MFN;
             }
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
-            if ( l2t )
-                free_xen_pagetable(l2t);
+
+            free_xen_pagetable_new(l2mfn);
         }
 
         /*
          * The L3 entry has been verified to be present, and we've dealt with
          * 1G pages as well, so the L2 table cannot require allocation.
          */
-        pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(v);
+        pl2e = map_l2t_from_l3e(*pl3e) + l2_table_offset(v);
 
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
@@ -5632,41 +5637,45 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             else
             {
                 l1_pgentry_t *l1t;
-
                 /* PSE: shatter the superpage and try again. */
-                l1t = alloc_xen_pagetable();
-                if ( !l1t )
+                mfn_t l1mfn = alloc_xen_pagetable_new();
+
+                if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
+                l1t = map_domain_page(l1mfn);
                 for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
+                UNMAP_DOMAIN_PAGE(l1t);
+
                 if ( locking )
                     spin_lock(&map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
-                    l2e_write_atomic(pl2e, l2e_from_mfn(virt_to_mfn(l1t),
+                    l2e_write_atomic(pl2e, l2e_from_mfn(l1mfn,
                                                         __PAGE_HYPERVISOR));
-                    l1t = NULL;
+                    l1mfn = INVALID_MFN;
                 }
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
-                if ( l1t )
-                    free_xen_pagetable(l1t);
+
+                free_xen_pagetable_new(l1mfn);
             }
         }
         else
         {
             l1_pgentry_t nl1e, *l1t;
+            mfn_t l1mfn;
 
             /*
              * Ordinary 4kB mapping: The L2 entry has been verified to be
              * present, and we've dealt with 2M pages as well, so the L1 table
              * cannot require allocation.
              */
-            pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(v);
+            pl1e = map_l1t_from_l2e(*pl2e) + l1_table_offset(v);
 
             /* Confirm the caller isn't trying to create new mappings. */
             if ( !(l1e_get_flags(*pl1e) & _PAGE_PRESENT) )
@@ -5677,6 +5686,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                (l1e_get_flags(*pl1e) & ~FLAGS_MASK) | nf);
 
             l1e_write_atomic(pl1e, nl1e);
+            UNMAP_DOMAIN_PAGE(pl1e);
             v += PAGE_SIZE;
 
             /*
@@ -5706,10 +5716,12 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 continue;
             }
 
-            l1t = l2e_to_l1e(*pl2e);
+            l1mfn = l2e_get_mfn(*pl2e);
+            l1t = map_domain_page(l1mfn);
             for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
                 if ( l1e_get_intpte(l1t[i]) != 0 )
                     break;
+            UNMAP_DOMAIN_PAGE(l1t);
             if ( i == L1_PAGETABLE_ENTRIES )
             {
                 /* Empty: zap the L2E and free the L1 page. */
@@ -5717,7 +5729,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable(l1t);
+                free_xen_pagetable_new(l1mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5748,11 +5760,13 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
 
         {
             l2_pgentry_t *l2t;
+            mfn_t l2mfn = l3e_get_mfn(*pl3e);
 
-            l2t = l3e_to_l2e(*pl3e);
+            l2t = map_domain_page(l2mfn);
             for ( i = 0; i < L2_PAGETABLE_ENTRIES; i++ )
                 if ( l2e_get_intpte(l2t[i]) != 0 )
                     break;
+            UNMAP_DOMAIN_PAGE(l2t);
             if ( i == L2_PAGETABLE_ENTRIES )
             {
                 /* Empty: zap the L3E and free the L2 page. */
@@ -5760,7 +5774,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable(l2t);
+                free_xen_pagetable_new(l2mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5773,6 +5787,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
     rc = 0;
 
  out:
+    unmap_domain_page(pl2e);
     unmap_domain_page(pl3e);
     return rc;
 }
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 06/15] x86_64/mm: introduce pl2e in paging_init
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (4 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon map and unmap pages in paging_init(). Introduce pl2e so
that we can use l2_ro_mpt to point to the page table itself.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v7:
- reword commit message.
---
 xen/arch/x86/x86_64/mm.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 102079a801..243014a119 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -479,7 +479,7 @@ void __init paging_init(void)
     unsigned long i, mpt_size, va;
     unsigned int n, memflags;
     l3_pgentry_t *l3_ro_mpt;
-    l2_pgentry_t *l2_ro_mpt = NULL;
+    l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
     struct page_info *l1_pg;
 
     /*
@@ -529,7 +529,7 @@ void __init paging_init(void)
             (L2_PAGETABLE_SHIFT - 3 + PAGE_SHIFT)));
 
         if ( cpu_has_page1gb &&
-             !((unsigned long)l2_ro_mpt & ~PAGE_MASK) &&
+             !((unsigned long)pl2e & ~PAGE_MASK) &&
              (mpt_size >> L3_PAGETABLE_SHIFT) > (i >> PAGETABLE_ORDER) )
         {
             unsigned int k, holes;
@@ -589,7 +589,7 @@ void __init paging_init(void)
             memset((void *)(RDWR_MPT_VIRT_START + (i << L2_PAGETABLE_SHIFT)),
                    0xFF, 1UL << L2_PAGETABLE_SHIFT);
         }
-        if ( !((unsigned long)l2_ro_mpt & ~PAGE_MASK) )
+        if ( !((unsigned long)pl2e & ~PAGE_MASK) )
         {
             if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
                 goto nomem;
@@ -597,13 +597,14 @@ void __init paging_init(void)
             l3e_write(&l3_ro_mpt[l3_table_offset(va)],
                       l3e_from_paddr(__pa(l2_ro_mpt),
                                      __PAGE_HYPERVISOR_RO | _PAGE_USER));
+            pl2e = l2_ro_mpt;
             ASSERT(!l2_table_offset(va));
         }
         /* NB. Cannot be GLOBAL: guest user mode should not see it. */
         if ( l1_pg )
-            l2e_write(l2_ro_mpt, l2e_from_page(
+            l2e_write(pl2e, l2e_from_page(
                 l1_pg, /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
-        l2_ro_mpt++;
+        pl2e++;
     }
 #undef CNT
 #undef MFN
@@ -613,6 +614,7 @@ void __init paging_init(void)
         goto nomem;
     compat_idle_pg_table_l2 = l2_ro_mpt;
     clear_page(l2_ro_mpt);
+    pl2e = l2_ro_mpt;
     /* Allocate and map the compatibility mode machine-to-phys table. */
     mpt_size = (mpt_size >> 1) + (1UL << (L2_PAGETABLE_SHIFT - 1));
     if ( mpt_size > RDWR_COMPAT_MPT_VIRT_END - RDWR_COMPAT_MPT_VIRT_START )
@@ -625,7 +627,7 @@ void __init paging_init(void)
              sizeof(*compat_machine_to_phys_mapping))
     BUILD_BUG_ON((sizeof(*frame_table) & ~sizeof(*frame_table)) % \
                  sizeof(*compat_machine_to_phys_mapping));
-    for ( i = 0; i < (mpt_size >> L2_PAGETABLE_SHIFT); i++, l2_ro_mpt++ )
+    for ( i = 0; i < (mpt_size >> L2_PAGETABLE_SHIFT); i++, pl2e++ )
     {
         memflags = MEMF_node(phys_to_nid(i <<
             (L2_PAGETABLE_SHIFT - 2 + PAGE_SHIFT)));
@@ -647,7 +649,7 @@ void __init paging_init(void)
                         (i << L2_PAGETABLE_SHIFT)),
                0xFF, 1UL << L2_PAGETABLE_SHIFT);
         /* NB. Cannot be GLOBAL as the ptes get copied into per-VM space. */
-        l2e_write(l2_ro_mpt, l2e_from_page(l1_pg, _PAGE_PSE|_PAGE_PRESENT));
+        l2e_write(pl2e, l2e_from_page(l1_pg, _PAGE_PSE|_PAGE_PRESENT));
     }
 #undef CNT
 #undef MFN
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 07/15] x86_64/mm: switch to new APIs in paging_init
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (5 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-08-07 14:09   ` Jan Beulich
  2020-07-27 14:21 ` [PATCH v8 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Map and unmap pages instead of relying on the direct map.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v8:
- replace l3/2_ro_mpt_mfn with just mfn since their lifetimes do not
  overlap

Changed in v7:
- use the new alloc_map_clear_xen_pt() helper.
- move the unmap of pl3t up a bit.
- remove the unmaps in the nomem path.
---
 xen/arch/x86/x86_64/mm.c | 35 +++++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 243014a119..ebf21d505b 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -481,6 +481,7 @@ void __init paging_init(void)
     l3_pgentry_t *l3_ro_mpt;
     l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
     struct page_info *l1_pg;
+    mfn_t mfn;
 
     /*
      * We setup the L3s for 1:1 mapping if host support memory hotplug
@@ -493,22 +494,23 @@ void __init paging_init(void)
         if ( !(l4e_get_flags(idle_pg_table[l4_table_offset(va)]) &
               _PAGE_PRESENT) )
         {
-            l3_pgentry_t *pl3t = alloc_xen_pagetable();
+            mfn_t l3mfn;
+            l3_pgentry_t *pl3t = alloc_map_clear_xen_pt(&l3mfn);
 
             if ( !pl3t )
                 goto nomem;
-            clear_page(pl3t);
+            UNMAP_DOMAIN_PAGE(pl3t);
             l4e_write(&idle_pg_table[l4_table_offset(va)],
-                      l4e_from_paddr(__pa(pl3t), __PAGE_HYPERVISOR_RW));
+                      l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR_RW));
         }
     }
 
     /* Create user-accessible L2 directory to map the MPT for guests. */
-    if ( (l3_ro_mpt = alloc_xen_pagetable()) == NULL )
+    l3_ro_mpt = alloc_map_clear_xen_pt(&mfn);
+    if ( !l3_ro_mpt )
         goto nomem;
-    clear_page(l3_ro_mpt);
     l4e_write(&idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)],
-              l4e_from_paddr(__pa(l3_ro_mpt), __PAGE_HYPERVISOR_RO | _PAGE_USER));
+              l4e_from_mfn(mfn, __PAGE_HYPERVISOR_RO | _PAGE_USER));
 
     /*
      * Allocate and map the machine-to-phys table.
@@ -591,12 +593,14 @@ void __init paging_init(void)
         }
         if ( !((unsigned long)pl2e & ~PAGE_MASK) )
         {
-            if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
+            UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+
+            l2_ro_mpt = alloc_map_clear_xen_pt(&mfn);
+            if ( !l2_ro_mpt )
                 goto nomem;
-            clear_page(l2_ro_mpt);
+
             l3e_write(&l3_ro_mpt[l3_table_offset(va)],
-                      l3e_from_paddr(__pa(l2_ro_mpt),
-                                     __PAGE_HYPERVISOR_RO | _PAGE_USER));
+                      l3e_from_mfn(mfn, __PAGE_HYPERVISOR_RO | _PAGE_USER));
             pl2e = l2_ro_mpt;
             ASSERT(!l2_table_offset(va));
         }
@@ -608,13 +612,16 @@ void __init paging_init(void)
     }
 #undef CNT
 #undef MFN
+    UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+    UNMAP_DOMAIN_PAGE(l3_ro_mpt);
 
     /* Create user-accessible L2 directory to map the MPT for compat guests. */
-    if ( (l2_ro_mpt = alloc_xen_pagetable()) == NULL )
+    mfn = alloc_xen_pagetable_new();
+    if ( mfn_eq(mfn, INVALID_MFN) )
         goto nomem;
-    compat_idle_pg_table_l2 = l2_ro_mpt;
-    clear_page(l2_ro_mpt);
-    pl2e = l2_ro_mpt;
+    compat_idle_pg_table_l2 = map_domain_page_global(mfn);
+    clear_page(compat_idle_pg_table_l2);
+    pl2e = compat_idle_pg_table_l2;
     /* Allocate and map the compatibility mode machine-to-phys table. */
     mpt_size = (mpt_size >> 1) + (1UL << (L2_PAGETABLE_SHIFT - 1));
     if ( mpt_size > RDWR_COMPAT_MPT_VIRT_END - RDWR_COMPAT_MPT_VIRT_START )
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 08/15] x86_64/mm: switch to new APIs in setup_m2p_table
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (6 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-07-27 14:21 ` [PATCH v8 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

While doing so, avoid repetitive mapping of l2_ro_mpt by keeping it
across loops, and only unmap and map it when crossing 1G boundaries.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v8:
- re-structure if condition around l2_ro_mpt.
- reword the commit message.

Changed in v7:
- avoid repetitive mapping of l2_ro_mpt.
- edit commit message.
- switch to alloc_map_clear_xen_pt().
---
 xen/arch/x86/x86_64/mm.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index ebf21d505b..640f561faf 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -385,7 +385,8 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
     ASSERT(l4e_get_flags(idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)])
             & _PAGE_PRESENT);
-    l3_ro_mpt = l4e_to_l3e(idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]);
+    l3_ro_mpt = map_l3t_from_l4e(
+                    idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]);
 
     smap = (info->spfn & (~((1UL << (L2_PAGETABLE_SHIFT - 3)) -1)));
     emap = ((info->epfn + ((1UL << (L2_PAGETABLE_SHIFT - 3)) - 1 )) &
@@ -403,6 +404,10 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
     i = smap;
     while ( i < emap )
     {
+        if ( (RO_MPT_VIRT_START + i * sizeof(*machine_to_phys_mapping)) &
+             ((1UL << L3_PAGETABLE_SHIFT) - 1) )
+            UNMAP_DOMAIN_PAGE(l2_ro_mpt);
+
         switch ( m2p_mapped(i) )
         {
         case M2P_1G_MAPPED:
@@ -438,32 +443,31 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
             ASSERT(!(l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
                   _PAGE_PSE));
-            if ( l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
-              _PAGE_PRESENT )
-                l2_ro_mpt = l3e_to_l2e(l3_ro_mpt[l3_table_offset(va)]) +
-                  l2_table_offset(va);
+            if ( l2_ro_mpt )
+                /* nothing */;
+            else if ( l3e_get_flags(l3_ro_mpt[l3_table_offset(va)]) &
+                      _PAGE_PRESENT )
+                l2_ro_mpt = map_l2t_from_l3e(l3_ro_mpt[l3_table_offset(va)]);
             else
             {
-                l2_ro_mpt = alloc_xen_pagetable();
+                mfn_t l2_ro_mpt_mfn;
+
+                l2_ro_mpt = alloc_map_clear_xen_pt(&l2_ro_mpt_mfn);
                 if ( !l2_ro_mpt )
                 {
                     ret = -ENOMEM;
                     goto error;
                 }
 
-                clear_page(l2_ro_mpt);
                 l3e_write(&l3_ro_mpt[l3_table_offset(va)],
-                          l3e_from_paddr(__pa(l2_ro_mpt),
-                                         __PAGE_HYPERVISOR_RO | _PAGE_USER));
-                l2_ro_mpt += l2_table_offset(va);
+                          l3e_from_mfn(l2_ro_mpt_mfn,
+                                       __PAGE_HYPERVISOR_RO | _PAGE_USER));
             }
 
             /* NB. Cannot be GLOBAL: guest user mode should not see it. */
-            l2e_write(l2_ro_mpt, l2e_from_mfn(mfn,
+            l2e_write(&l2_ro_mpt[l2_table_offset(va)], l2e_from_mfn(mfn,
                    /*_PAGE_GLOBAL|*/_PAGE_PSE|_PAGE_USER|_PAGE_PRESENT));
         }
-        if ( !((unsigned long)l2_ro_mpt & ~PAGE_MASK) )
-            l2_ro_mpt = NULL;
         i += ( 1UL << (L2_PAGETABLE_SHIFT - 3));
     }
 #undef CNT
@@ -471,6 +475,8 @@ static int setup_m2p_table(struct mem_hotadd_info *info)
 
     ret = setup_compat_m2p_table(info);
 error:
+    unmap_domain_page(l2_ro_mpt);
+    unmap_domain_page(l3_ro_mpt);
     return ret;
 }
 
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 09/15] efi: use new page table APIs in copy_mapping
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (7 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
@ 2020-07-27 14:21 ` Hongyan Xia
  2020-08-07 14:13   ` Jan Beulich
  2020-07-27 14:22 ` [PATCH v8 10/15] efi: switch to new APIs in EFI code Hongyan Xia
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:21 UTC (permalink / raw)
  To: xen-devel; +Cc: jgrall, Jan Beulich

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v8:
- remove redundant commit message.
- unmap l3src based on va instead of mfn.
- re-structure if condition around l3dst.

Changed in v7:
- hoist l3 variables out of the loop to avoid repetitive mappings.
---
 xen/common/efi/boot.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 5a520bf21d..f116759538 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -6,6 +6,7 @@
 #include <xen/compile.h>
 #include <xen/ctype.h>
 #include <xen/dmi.h>
+#include <xen/domain_page.h>
 #include <xen/init.h>
 #include <xen/keyhandler.h>
 #include <xen/lib.h>
@@ -1442,29 +1443,42 @@ static __init void copy_mapping(unsigned long mfn, unsigned long end,
                                                  unsigned long emfn))
 {
     unsigned long next;
+    l3_pgentry_t *l3src = NULL, *l3dst = NULL;
 
     for ( ; mfn < end; mfn = next )
     {
         l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)];
-        l3_pgentry_t *l3src, *l3dst;
         unsigned long va = (unsigned long)mfn_to_virt(mfn);
 
+        if ( !(mfn & ((1UL << (L4_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)) )
+            UNMAP_DOMAIN_PAGE(l3dst);
+        if ( !(va & ((1UL << L4_PAGETABLE_SHIFT) - 1)) )
+            UNMAP_DOMAIN_PAGE(l3src);
         next = mfn + (1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT));
         if ( !is_valid(mfn, min(next, end)) )
             continue;
-        if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+
+        if ( l3dst )
+            /* nothing */;
+        else if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
         {
-            l3dst = alloc_xen_pagetable();
+            mfn_t l3mfn;
+
+            l3dst = alloc_map_clear_xen_pt(&l3mfn);
             BUG_ON(!l3dst);
-            clear_page(l3dst);
             efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
-                l4e_from_paddr(virt_to_maddr(l3dst), __PAGE_HYPERVISOR);
+                l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
         }
         else
-            l3dst = l4e_to_l3e(l4e);
-        l3src = l4e_to_l3e(idle_pg_table[l4_table_offset(va)]);
+            l3dst = map_l3t_from_l4e(l4e);
+
+        if ( !l3src )
+            l3src = map_l3t_from_l4e(idle_pg_table[l4_table_offset(va)]);
         l3dst[l3_table_offset(mfn << PAGE_SHIFT)] = l3src[l3_table_offset(va)];
     }
+
+    unmap_domain_page(l3src);
+    unmap_domain_page(l3dst);
 }
 
 static bool __init ram_range_valid(unsigned long smfn, unsigned long emfn)
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 10/15] efi: switch to new APIs in EFI code
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (8 preceding siblings ...)
  2020-07-27 14:21 ` [PATCH v8 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v7:
- add blank line after declaration.
- rename efi_l4_pgtable into efi_l4t.
- pass the mapped efi_l4t to copy_mapping() instead of map it again.
- use the alloc_map_clear_xen_pt() API.
- unmap pl3e, pl2e, l1t earlier.
---
 xen/arch/x86/efi/runtime.h | 13 ++++++++---
 xen/common/efi/boot.c      | 55 +++++++++++++++++++++++++++-------------------
 xen/common/efi/efi.h       |  3 ++-
 xen/common/efi/runtime.c   |  8 +++----
 4 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/efi/runtime.h b/xen/arch/x86/efi/runtime.h
index d9eb8f5c27..77866c5f21 100644
--- a/xen/arch/x86/efi/runtime.h
+++ b/xen/arch/x86/efi/runtime.h
@@ -1,12 +1,19 @@
+#include <xen/domain_page.h>
+#include <xen/mm.h>
 #include <asm/atomic.h>
 #include <asm/mc146818rtc.h>
 
 #ifndef COMPAT
-l4_pgentry_t *__read_mostly efi_l4_pgtable;
+mfn_t __read_mostly efi_l4_mfn = INVALID_MFN_INITIALIZER;
 
 void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e)
 {
-    if ( efi_l4_pgtable )
-        l4e_write(efi_l4_pgtable + l4idx, l4e);
+    if ( !mfn_eq(efi_l4_mfn, INVALID_MFN) )
+    {
+        l4_pgentry_t *efi_l4t = map_domain_page(efi_l4_mfn);
+
+        l4e_write(efi_l4t + l4idx, l4e);
+        unmap_domain_page(efi_l4t);
+    }
 }
 #endif
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index f116759538..f2f1dbbc77 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1440,14 +1440,15 @@ custom_param("efi", parse_efi_param);
 
 static __init void copy_mapping(unsigned long mfn, unsigned long end,
                                 bool (*is_valid)(unsigned long smfn,
-                                                 unsigned long emfn))
+                                                 unsigned long emfn),
+                                l4_pgentry_t *efi_l4t)
 {
     unsigned long next;
     l3_pgentry_t *l3src = NULL, *l3dst = NULL;
 
     for ( ; mfn < end; mfn = next )
     {
-        l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)];
+        l4_pgentry_t l4e = efi_l4t[l4_table_offset(mfn << PAGE_SHIFT)];
         unsigned long va = (unsigned long)mfn_to_virt(mfn);
 
         if ( !(mfn & ((1UL << (L4_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)) )
@@ -1466,7 +1467,7 @@ static __init void copy_mapping(unsigned long mfn, unsigned long end,
 
             l3dst = alloc_map_clear_xen_pt(&l3mfn);
             BUG_ON(!l3dst);
-            efi_l4_pgtable[l4_table_offset(mfn << PAGE_SHIFT)] =
+            efi_l4t[l4_table_offset(mfn << PAGE_SHIFT)] =
                 l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
         }
         else
@@ -1499,6 +1500,7 @@ static bool __init rt_range_valid(unsigned long smfn, unsigned long emfn)
 void __init efi_init_memory(void)
 {
     unsigned int i;
+    l4_pgentry_t *efi_l4t;
     struct rt_extra {
         struct rt_extra *next;
         unsigned long smfn, emfn;
@@ -1613,11 +1615,10 @@ void __init efi_init_memory(void)
      * Set up 1:1 page tables for runtime calls. See SetVirtualAddressMap() in
      * efi_exit_boot().
      */
-    efi_l4_pgtable = alloc_xen_pagetable();
-    BUG_ON(!efi_l4_pgtable);
-    clear_page(efi_l4_pgtable);
+    efi_l4t = alloc_map_clear_xen_pt(&efi_l4_mfn);
+    BUG_ON(!efi_l4t);
 
-    copy_mapping(0, max_page, ram_range_valid);
+    copy_mapping(0, max_page, ram_range_valid, efi_l4t);
 
     /* Insert non-RAM runtime mappings inside the direct map. */
     for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size )
@@ -1633,58 +1634,64 @@ void __init efi_init_memory(void)
             copy_mapping(PFN_DOWN(desc->PhysicalStart),
                          PFN_UP(desc->PhysicalStart +
                                 (desc->NumberOfPages << EFI_PAGE_SHIFT)),
-                         rt_range_valid);
+                         rt_range_valid, efi_l4t);
     }
 
     /* Insert non-RAM runtime mappings outside of the direct map. */
     while ( (extra = extra_head) != NULL )
     {
         unsigned long addr = extra->smfn << PAGE_SHIFT;
-        l4_pgentry_t l4e = efi_l4_pgtable[l4_table_offset(addr)];
+        l4_pgentry_t l4e = efi_l4t[l4_table_offset(addr)];
         l3_pgentry_t *pl3e;
         l2_pgentry_t *pl2e;
         l1_pgentry_t *l1t;
 
         if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
         {
-            pl3e = alloc_xen_pagetable();
+            mfn_t l3mfn;
+
+            pl3e = alloc_map_clear_xen_pt(&l3mfn);
             BUG_ON(!pl3e);
-            clear_page(pl3e);
-            efi_l4_pgtable[l4_table_offset(addr)] =
-                l4e_from_paddr(virt_to_maddr(pl3e), __PAGE_HYPERVISOR);
+            efi_l4t[l4_table_offset(addr)] =
+                l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
         }
         else
-            pl3e = l4e_to_l3e(l4e);
+            pl3e = map_l3t_from_l4e(l4e);
         pl3e += l3_table_offset(addr);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
-            pl2e = alloc_xen_pagetable();
+            mfn_t l2mfn;
+
+            pl2e = alloc_map_clear_xen_pt(&l2mfn);
             BUG_ON(!pl2e);
-            clear_page(pl2e);
-            *pl3e = l3e_from_paddr(virt_to_maddr(pl2e), __PAGE_HYPERVISOR);
+            *pl3e = l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR);
         }
         else
         {
             BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
-            pl2e = l3e_to_l2e(*pl3e);
+            pl2e = map_l2t_from_l3e(*pl3e);
         }
+        UNMAP_DOMAIN_PAGE(pl3e);
         pl2e += l2_table_offset(addr);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
-            l1t = alloc_xen_pagetable();
+            mfn_t l1mfn;
+
+            l1t = alloc_map_clear_xen_pt(&l1mfn);
             BUG_ON(!l1t);
-            clear_page(l1t);
-            *pl2e = l2e_from_paddr(virt_to_maddr(l1t), __PAGE_HYPERVISOR);
+            *pl2e = l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR);
         }
         else
         {
             BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
-            l1t = l2e_to_l1e(*pl2e);
+            l1t = map_l1t_from_l2e(*pl2e);
         }
+        UNMAP_DOMAIN_PAGE(pl2e);
         for ( i = l1_table_offset(addr);
               i < L1_PAGETABLE_ENTRIES && extra->smfn < extra->emfn;
               ++i, ++extra->smfn )
             l1t[i] = l1e_from_pfn(extra->smfn, extra->prot);
+        UNMAP_DOMAIN_PAGE(l1t);
 
         if ( extra->smfn == extra->emfn )
         {
@@ -1696,6 +1703,8 @@ void __init efi_init_memory(void)
     /* Insert Xen mappings. */
     for ( i = l4_table_offset(HYPERVISOR_VIRT_START);
           i < l4_table_offset(DIRECTMAP_VIRT_END); ++i )
-        efi_l4_pgtable[i] = idle_pg_table[i];
+        efi_l4t[i] = idle_pg_table[i];
+
+    unmap_domain_page(efi_l4t);
 }
 #endif
diff --git a/xen/common/efi/efi.h b/xen/common/efi/efi.h
index 2e38d05f3d..e364bae3e0 100644
--- a/xen/common/efi/efi.h
+++ b/xen/common/efi/efi.h
@@ -6,6 +6,7 @@
 #include <efi/eficapsule.h>
 #include <efi/efiapi.h>
 #include <xen/efi.h>
+#include <xen/mm.h>
 #include <xen/spinlock.h>
 #include <asm/page.h>
 
@@ -29,7 +30,7 @@ extern UINTN efi_memmap_size, efi_mdesc_size;
 extern void *efi_memmap;
 
 #ifdef CONFIG_X86
-extern l4_pgentry_t *efi_l4_pgtable;
+extern mfn_t efi_l4_mfn;
 #endif
 
 extern const struct efi_pci_rom *efi_pci_roms;
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 95367694b5..375b94229e 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -85,7 +85,7 @@ struct efi_rs_state efi_rs_enter(void)
     static const u32 mxcsr = MXCSR_DEFAULT;
     struct efi_rs_state state = { .cr3 = 0 };
 
-    if ( !efi_l4_pgtable )
+    if ( mfn_eq(efi_l4_mfn, INVALID_MFN) )
         return state;
 
     state.cr3 = read_cr3();
@@ -111,7 +111,7 @@ struct efi_rs_state efi_rs_enter(void)
         lgdt(&gdt_desc);
     }
 
-    switch_cr3_cr4(virt_to_maddr(efi_l4_pgtable), read_cr4());
+    switch_cr3_cr4(mfn_to_maddr(efi_l4_mfn), read_cr4());
 
     return state;
 }
@@ -140,9 +140,9 @@ void efi_rs_leave(struct efi_rs_state *state)
 
 bool efi_rs_using_pgtables(void)
 {
-    return efi_l4_pgtable &&
+    return !mfn_eq(efi_l4_mfn, INVALID_MFN) &&
            (smp_processor_id() == efi_rs_on_cpu) &&
-           (read_cr3() == virt_to_maddr(efi_l4_pgtable));
+           (read_cr3() == mfn_to_maddr(efi_l4_mfn));
 }
 
 unsigned long efi_get_time(void)
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 11/15] x86/smpboot: add exit path for clone_mapping()
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (9 preceding siblings ...)
  2020-07-27 14:22 ` [PATCH v8 10/15] efi: switch to new APIs in EFI code Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

We will soon need to clean up page table mappings in the exit path.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

---
Changed in v7:
- edit commit message.
- begin with rc = 0 and set it to -ENOMEM ahead of if().
---
 xen/arch/x86/smpboot.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 5708573c41..05df08f945 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -676,6 +676,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     l3_pgentry_t *pl3e;
     l2_pgentry_t *pl2e;
     l1_pgentry_t *pl1e;
+    int rc = 0;
 
     /*
      * Sanity check 'linear'.  We only allow cloning from the Xen virtual
@@ -716,7 +717,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
             pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
             flags = l1e_get_flags(*pl1e);
             if ( !(flags & _PAGE_PRESENT) )
-                return 0;
+                goto out;
             pfn = l1e_get_pfn(*pl1e);
         }
     }
@@ -724,8 +725,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
     {
         pl3e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl3e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl3e);
         l4e_write(&rpt[root_table_offset(linear)],
                   l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
@@ -738,8 +740,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
         pl2e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl2e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl2e);
         l3e_write(pl3e, l3e_from_paddr(__pa(pl2e), __PAGE_HYPERVISOR));
     }
@@ -754,8 +757,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
         pl1e = alloc_xen_pagetable();
+        rc = -ENOMEM;
         if ( !pl1e )
-            return -ENOMEM;
+            goto out;
         clear_page(pl1e);
         l2e_write(pl2e, l2e_from_paddr(__pa(pl1e), __PAGE_HYPERVISOR));
     }
@@ -776,7 +780,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     else
         l1e_write(pl1e, l1e_from_pfn(pfn, flags));
 
-    return 0;
+    rc = 0;
+ out:
+    return rc;
 }
 
 DEFINE_PER_CPU(root_pgentry_t *, root_pgt);
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 12/15] x86/smpboot: switch clone_mapping() to new APIs
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (10 preceding siblings ...)
  2020-07-27 14:22 ` [PATCH v8 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 13/15] x86/mm: drop old page table APIs Hongyan Xia
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v7:
- change patch title
- remove initialiser of pl3e.
- combine the initialisation of pl3e into a single assignment.
- use the new alloc_map_clear() helper.
- use the normal map_domain_page() in the error path.
---
 xen/arch/x86/smpboot.c | 44 +++++++++++++++++++++++++++-----------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 05df08f945..c965222e19 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -674,8 +674,8 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     unsigned long linear = (unsigned long)ptr, pfn;
     unsigned int flags;
     l3_pgentry_t *pl3e;
-    l2_pgentry_t *pl2e;
-    l1_pgentry_t *pl1e;
+    l2_pgentry_t *pl2e = NULL;
+    l1_pgentry_t *pl1e = NULL;
     int rc = 0;
 
     /*
@@ -690,7 +690,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
          (linear >= XEN_VIRT_END && linear < DIRECTMAP_VIRT_START) )
         return -EINVAL;
 
-    pl3e = l4e_to_l3e(idle_pg_table[root_table_offset(linear)]) +
+    pl3e = map_l3t_from_l4e(idle_pg_table[root_table_offset(linear)]) +
         l3_table_offset(linear);
 
     flags = l3e_get_flags(*pl3e);
@@ -703,7 +703,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
     }
     else
     {
-        pl2e = l3e_to_l2e(*pl3e) + l2_table_offset(linear);
+        pl2e = map_l2t_from_l3e(*pl3e) + l2_table_offset(linear);
         flags = l2e_get_flags(*pl2e);
         ASSERT(flags & _PAGE_PRESENT);
         if ( flags & _PAGE_PSE )
@@ -714,7 +714,7 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
         }
         else
         {
-            pl1e = l2e_to_l1e(*pl2e) + l1_table_offset(linear);
+            pl1e = map_l1t_from_l2e(*pl2e) + l1_table_offset(linear);
             flags = l1e_get_flags(*pl1e);
             if ( !(flags & _PAGE_PRESENT) )
                 goto out;
@@ -722,51 +722,58 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
         }
     }
 
+    UNMAP_DOMAIN_PAGE(pl1e);
+    UNMAP_DOMAIN_PAGE(pl2e);
+    UNMAP_DOMAIN_PAGE(pl3e);
+
     if ( !(root_get_flags(rpt[root_table_offset(linear)]) & _PAGE_PRESENT) )
     {
-        pl3e = alloc_xen_pagetable();
+        mfn_t l3mfn;
+
+        pl3e = alloc_map_clear_xen_pt(&l3mfn);
         rc = -ENOMEM;
         if ( !pl3e )
             goto out;
-        clear_page(pl3e);
         l4e_write(&rpt[root_table_offset(linear)],
-                  l4e_from_paddr(__pa(pl3e), __PAGE_HYPERVISOR));
+                  l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR));
     }
     else
-        pl3e = l4e_to_l3e(rpt[root_table_offset(linear)]);
+        pl3e = map_l3t_from_l4e(rpt[root_table_offset(linear)]);
 
     pl3e += l3_table_offset(linear);
 
     if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
     {
-        pl2e = alloc_xen_pagetable();
+        mfn_t l2mfn;
+
+        pl2e = alloc_map_clear_xen_pt(&l2mfn);
         rc = -ENOMEM;
         if ( !pl2e )
             goto out;
-        clear_page(pl2e);
-        l3e_write(pl3e, l3e_from_paddr(__pa(pl2e), __PAGE_HYPERVISOR));
+        l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
     }
     else
     {
         ASSERT(!(l3e_get_flags(*pl3e) & _PAGE_PSE));
-        pl2e = l3e_to_l2e(*pl3e);
+        pl2e = map_l2t_from_l3e(*pl3e);
     }
 
     pl2e += l2_table_offset(linear);
 
     if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
     {
-        pl1e = alloc_xen_pagetable();
+        mfn_t l1mfn;
+
+        pl1e = alloc_map_clear_xen_pt(&l1mfn);
         rc = -ENOMEM;
         if ( !pl1e )
             goto out;
-        clear_page(pl1e);
-        l2e_write(pl2e, l2e_from_paddr(__pa(pl1e), __PAGE_HYPERVISOR));
+        l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
     }
     else
     {
         ASSERT(!(l2e_get_flags(*pl2e) & _PAGE_PSE));
-        pl1e = l2e_to_l1e(*pl2e);
+        pl1e = map_l1t_from_l2e(*pl2e);
     }
 
     pl1e += l1_table_offset(linear);
@@ -782,6 +789,9 @@ static int clone_mapping(const void *ptr, root_pgentry_t *rpt)
 
     rc = 0;
  out:
+    unmap_domain_page(pl1e);
+    unmap_domain_page(pl2e);
+    unmap_domain_page(pl3e);
     return rc;
 }
 
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 13/15] x86/mm: drop old page table APIs
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (11 preceding siblings ...)
  2020-07-27 14:22 ` [PATCH v8 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 14/15] x86: switch to use domheap page for page tables Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

Two sets of old APIs, alloc/free_xen_pagetable() and lXe_to_lYe(), are
now dropped to avoid the dependency on direct map.

There are two special cases which still have not been re-written into
the new APIs, thus need special treatment:

rpt in smpboot.c cannot use ephemeral mappings yet. The problem is that
rpt is read and written in context switch code, but the mapping
infrastructure is NOT context-switch-safe, meaning we cannot map rpt in
one domain and unmap in another. Before the mapping infrastructure
supports context switches, rpt has to be globally mapped.

Also, lXe_to_lYe() during Xen image relocation cannot be converted into
map/unmap pairs. We cannot hold on to mappings while the mapping
infrastructure is being relocated! It is enough to remove the direct map
in the second e820 pass, so we still use the direct map (<4GiB) in Xen
relocation (which is during the first e820 pass).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c          | 14 --------------
 xen/arch/x86/setup.c       |  4 ++--
 xen/arch/x86/smpboot.c     |  4 ++--
 xen/include/asm-x86/mm.h   |  2 --
 xen/include/asm-x86/page.h |  5 -----
 5 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 199940a345..76b8c681c9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4925,20 +4925,6 @@ int mmcfg_intercept_write(
     return X86EMUL_OKAY;
 }
 
-void *alloc_xen_pagetable(void)
-{
-    mfn_t mfn = alloc_xen_pagetable_new();
-
-    return mfn_eq(mfn, INVALID_MFN) ? NULL : mfn_to_virt(mfn_x(mfn));
-}
-
-void free_xen_pagetable(void *v)
-{
-    mfn_t mfn = v ? virt_to_mfn(v) : INVALID_MFN;
-
-    free_xen_pagetable_new(mfn);
-}
-
 /*
  * For these PTE APIs, the caller must follow the alloc-map-unmap-free
  * lifecycle, which means explicitly mapping the PTE pages before accessing
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index c9b6af826d..1f73589d5b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1247,7 +1247,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                     continue;
                 *pl4e = l4e_from_intpte(l4e_get_intpte(*pl4e) +
                                         xen_phys_start);
-                pl3e = l4e_to_l3e(*pl4e);
+                pl3e = __va(l4e_get_paddr(*pl4e));
                 for ( j = 0; j < L3_PAGETABLE_ENTRIES; j++, pl3e++ )
                 {
                     /* Not present, 1GB mapping, or already relocated? */
@@ -1257,7 +1257,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                         continue;
                     *pl3e = l3e_from_intpte(l3e_get_intpte(*pl3e) +
                                             xen_phys_start);
-                    pl2e = l3e_to_l2e(*pl3e);
+                    pl2e = __va(l3e_get_paddr(*pl3e));
                     for ( k = 0; k < L2_PAGETABLE_ENTRIES; k++, pl2e++ )
                     {
                         /* Not present, PSE, or already relocated? */
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index c965222e19..f431f526da 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -810,7 +810,7 @@ static int setup_cpu_root_pgt(unsigned int cpu)
     if ( !opt_xpti_hwdom && !opt_xpti_domu )
         return 0;
 
-    rpt = alloc_xen_pagetable();
+    rpt = alloc_xenheap_page();
     if ( !rpt )
         return -ENOMEM;
 
@@ -913,7 +913,7 @@ static void cleanup_cpu_root_pgt(unsigned int cpu)
         free_xen_pagetable_new(l3mfn);
     }
 
-    free_xen_pagetable(rpt);
+    free_xenheap_page(rpt);
 
     /* Also zap the stub mapping for this CPU. */
     if ( stub_linear )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 5b76308948..1bd8198133 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -582,8 +582,6 @@ int vcpu_destroy_pagetables(struct vcpu *);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 /* Allocator functions for Xen pagetables. */
-void *alloc_xen_pagetable(void);
-void free_xen_pagetable(void *v);
 mfn_t alloc_xen_pagetable_new(void);
 void free_xen_pagetable_new(mfn_t mfn);
 void *alloc_map_clear_xen_pt(mfn_t *pmfn);
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 608a048c28..45ed561772 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -188,11 +188,6 @@ static inline l4_pgentry_t l4e_from_paddr(paddr_t pa, unsigned int flags)
 #define l4e_has_changed(x,y,flags) \
     ( !!(((x).l4 ^ (y).l4) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )
 
-/* Pagetable walking. */
-#define l2e_to_l1e(x)              ((l1_pgentry_t *)__va(l2e_get_paddr(x)))
-#define l3e_to_l2e(x)              ((l2_pgentry_t *)__va(l3e_get_paddr(x)))
-#define l4e_to_l3e(x)              ((l3_pgentry_t *)__va(l4e_get_paddr(x)))
-
 #define map_l1t_from_l2e(x)        (l1_pgentry_t *)map_domain_page(l2e_get_mfn(x))
 #define map_l2t_from_l3e(x)        (l2_pgentry_t *)map_domain_page(l3e_get_mfn(x))
 #define map_l3t_from_l4e(x)        (l3_pgentry_t *)map_domain_page(l4e_get_mfn(x))
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 14/15] x86: switch to use domheap page for page tables
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (12 preceding siblings ...)
  2020-07-27 14:22 ` [PATCH v8 13/15] x86/mm: drop old page table APIs Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  2020-07-27 14:22 ` [PATCH v8 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Hongyan Xia <hongyxia@amazon.com>

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

---
Changed in v8:
- const qualify pg in alloc_xen_pagetable_new().
---
 xen/arch/x86/mm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 76b8c681c9..8348f6329f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4935,10 +4935,10 @@ mfn_t alloc_xen_pagetable_new(void)
 {
     if ( system_state != SYS_STATE_early_boot )
     {
-        void *ptr = alloc_xenheap_page();
+        const struct page_info *pg = alloc_domheap_page(NULL, 0);
 
-        BUG_ON(!hardware_domain && !ptr);
-        return ptr ? virt_to_mfn(ptr) : INVALID_MFN;
+        BUG_ON(!hardware_domain && !pg);
+        return pg ? page_to_mfn(pg) : INVALID_MFN;
     }
 
     return alloc_boot_pages(1, 1);
@@ -4948,7 +4948,7 @@ mfn_t alloc_xen_pagetable_new(void)
 void free_xen_pagetable_new(mfn_t mfn)
 {
     if ( system_state != SYS_STATE_early_boot && !mfn_eq(mfn, INVALID_MFN) )
-        free_xenheap_page(mfn_to_virt(mfn_x(mfn)));
+        free_domheap_page(mfn_to_page(mfn));
 }
 
 void *alloc_map_clear_xen_pt(mfn_t *pmfn)
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v8 15/15] x86/mm: drop _new suffix for page table APIs
  2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
                   ` (13 preceding siblings ...)
  2020-07-27 14:22 ` [PATCH v8 14/15] x86: switch to use domheap page for page tables Hongyan Xia
@ 2020-07-27 14:22 ` Hongyan Xia
  14 siblings, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-07-27 14:22 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, jgrall, Wei Liu, Jan Beulich, Roger Pau Monné

From: Wei Liu <wei.liu2@citrix.com>

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/arch/x86/mm.c        | 44 ++++++++++++++++++++++----------------------
 xen/arch/x86/smpboot.c   |  6 +++---
 xen/arch/x86/x86_64/mm.c |  2 +-
 xen/include/asm-x86/mm.h |  4 ++--
 4 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8348f6329f..465a5bf0df 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -356,7 +356,7 @@ void __init arch_init_memory(void)
             ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
             if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
             {
-                mfn_t l3mfn = alloc_xen_pagetable_new();
+                mfn_t l3mfn = alloc_xen_pagetable();
 
                 if ( !mfn_eq(l3mfn, INVALID_MFN) )
                 {
@@ -4931,7 +4931,7 @@ int mmcfg_intercept_write(
  * them. The caller must check whether the allocation has succeeded, and only
  * pass valid MFNs to map_domain_page().
  */
-mfn_t alloc_xen_pagetable_new(void)
+mfn_t alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
     {
@@ -4945,7 +4945,7 @@ mfn_t alloc_xen_pagetable_new(void)
 }
 
 /* mfn can be INVALID_MFN */
-void free_xen_pagetable_new(mfn_t mfn)
+void free_xen_pagetable(mfn_t mfn)
 {
     if ( system_state != SYS_STATE_early_boot && !mfn_eq(mfn, INVALID_MFN) )
         free_domheap_page(mfn_to_page(mfn));
@@ -4953,7 +4953,7 @@ void free_xen_pagetable_new(mfn_t mfn)
 
 void *alloc_map_clear_xen_pt(mfn_t *pmfn)
 {
-    mfn_t mfn = alloc_xen_pagetable_new();
+    mfn_t mfn = alloc_xen_pagetable();
     void *ret;
 
     if ( mfn_eq(mfn, INVALID_MFN) )
@@ -4999,7 +4999,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l3mfn);
+        free_xen_pagetable(l3mfn);
     }
 
     return map_l3t_from_l4e(*pl4e) + l3_table_offset(v);
@@ -5034,7 +5034,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l2mfn);
+        free_xen_pagetable(l2mfn);
     }
 
     BUG_ON(l3e_get_flags(*pl3e) & _PAGE_PSE);
@@ -5073,7 +5073,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
         }
         if ( locking )
             spin_unlock(&map_pgdir_lock);
-        free_xen_pagetable_new(l1mfn);
+        free_xen_pagetable(l1mfn);
     }
 
     BUG_ON(l2e_get_flags(*pl2e) & _PAGE_PSE);
@@ -5182,10 +5182,10 @@ int map_pages_to_xen(
                         ol2e = l2t[i];
                         if ( (l2e_get_flags(ol2e) & _PAGE_PRESENT) &&
                              !(l2e_get_flags(ol2e) & _PAGE_PSE) )
-                            free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                            free_xen_pagetable(l2e_get_mfn(ol2e));
                     }
                     unmap_domain_page(l2t);
-                    free_xen_pagetable_new(l3e_get_mfn(ol3e));
+                    free_xen_pagetable(l3e_get_mfn(ol3e));
                 }
             }
 
@@ -5224,7 +5224,7 @@ int map_pages_to_xen(
                 continue;
             }
 
-            l2mfn = alloc_xen_pagetable_new();
+            l2mfn = alloc_xen_pagetable();
             if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
@@ -5252,7 +5252,7 @@ int map_pages_to_xen(
                 spin_unlock(&map_pgdir_lock);
             flush_area(virt, flush_flags);
 
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         pl2e = virt_to_xen_l2e(virt);
@@ -5286,7 +5286,7 @@ int map_pages_to_xen(
                         flush_flags(l1e_get_flags(l1t[i]));
                     flush_area(virt, flush_flags);
                     unmap_domain_page(l1t);
-                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                    free_xen_pagetable(l2e_get_mfn(ol2e));
                 }
             }
 
@@ -5331,7 +5331,7 @@ int map_pages_to_xen(
                     goto check_l3;
                 }
 
-                l1mfn = alloc_xen_pagetable_new();
+                l1mfn = alloc_xen_pagetable();
                 if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
 
@@ -5358,7 +5358,7 @@ int map_pages_to_xen(
                     spin_unlock(&map_pgdir_lock);
                 flush_area(virt, flush_flags);
 
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
 
             pl1e  = map_l1t_from_l2e(*pl2e) + l1_table_offset(virt);
@@ -5424,7 +5424,7 @@ int map_pages_to_xen(
                     flush_area(virt - PAGE_SIZE,
                                FLUSH_TLB_GLOBAL |
                                FLUSH_ORDER(PAGETABLE_ORDER));
-                    free_xen_pagetable_new(l2e_get_mfn(ol2e));
+                    free_xen_pagetable(l2e_get_mfn(ol2e));
                 }
                 else if ( locking )
                     spin_unlock(&map_pgdir_lock);
@@ -5475,7 +5475,7 @@ int map_pages_to_xen(
                 flush_area(virt - PAGE_SIZE,
                            FLUSH_TLB_GLOBAL |
                            FLUSH_ORDER(2*PAGETABLE_ORDER));
-                free_xen_pagetable_new(l3e_get_mfn(ol3e));
+                free_xen_pagetable(l3e_get_mfn(ol3e));
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5564,7 +5564,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             }
 
             /* PAGE1GB: shatter the superpage and fall through. */
-            l2mfn = alloc_xen_pagetable_new();
+            l2mfn = alloc_xen_pagetable();
             if ( mfn_eq(l2mfn, INVALID_MFN) )
                 goto out;
 
@@ -5588,7 +5588,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             if ( locking )
                 spin_unlock(&map_pgdir_lock);
 
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         /*
@@ -5624,7 +5624,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
             {
                 l1_pgentry_t *l1t;
                 /* PSE: shatter the superpage and try again. */
-                mfn_t l1mfn = alloc_xen_pagetable_new();
+                mfn_t l1mfn = alloc_xen_pagetable();
 
                 if ( mfn_eq(l1mfn, INVALID_MFN) )
                     goto out;
@@ -5648,7 +5648,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
 
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
         }
         else
@@ -5715,7 +5715,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable_new(l1mfn);
+                free_xen_pagetable(l1mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
@@ -5760,7 +5760,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                 if ( locking )
                     spin_unlock(&map_pgdir_lock);
                 flush_area(NULL, FLUSH_TLB_GLOBAL); /* flush before free */
-                free_xen_pagetable_new(l2mfn);
+                free_xen_pagetable(l2mfn);
             }
             else if ( locking )
                 spin_unlock(&map_pgdir_lock);
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index f431f526da..a01412a986 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -902,15 +902,15 @@ static void cleanup_cpu_root_pgt(unsigned int cpu)
                     continue;
 
                 ASSERT(!(l2e_get_flags(l2t[i2]) & _PAGE_PSE));
-                free_xen_pagetable_new(l2e_get_mfn(l2t[i2]));
+                free_xen_pagetable(l2e_get_mfn(l2t[i2]));
             }
 
             unmap_domain_page(l2t);
-            free_xen_pagetable_new(l2mfn);
+            free_xen_pagetable(l2mfn);
         }
 
         unmap_domain_page(l3t);
-        free_xen_pagetable_new(l3mfn);
+        free_xen_pagetable(l3mfn);
     }
 
     free_xenheap_page(rpt);
diff --git a/xen/arch/x86/x86_64/mm.c b/xen/arch/x86/x86_64/mm.c
index 640f561faf..74c0bbb4aa 100644
--- a/xen/arch/x86/x86_64/mm.c
+++ b/xen/arch/x86/x86_64/mm.c
@@ -622,7 +622,7 @@ void __init paging_init(void)
     UNMAP_DOMAIN_PAGE(l3_ro_mpt);
 
     /* Create user-accessible L2 directory to map the MPT for compat guests. */
-    mfn = alloc_xen_pagetable_new();
+    mfn = alloc_xen_pagetable();
     if ( mfn_eq(mfn, INVALID_MFN) )
         goto nomem;
     compat_idle_pg_table_l2 = map_domain_page_global(mfn);
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 1bd8198133..908d67664d 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -582,8 +582,8 @@ int vcpu_destroy_pagetables(struct vcpu *);
 void *do_page_walk(struct vcpu *v, unsigned long addr);
 
 /* Allocator functions for Xen pagetables. */
-mfn_t alloc_xen_pagetable_new(void);
-void free_xen_pagetable_new(mfn_t mfn);
+mfn_t alloc_xen_pagetable(void);
+void free_xen_pagetable(mfn_t mfn);
 void *alloc_map_clear_xen_pt(mfn_t *pmfn);
 
 l1_pgentry_t *virt_to_xen_l1e(unsigned long v);
-- 
2.16.6



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-07-27 14:21 ` [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
@ 2020-08-07 14:05   ` Jan Beulich
  2020-08-13 16:08     ` Hongyan Xia
  2020-12-07 15:28   ` Hongyan Xia
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2020-08-07 14:05 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper, jgrall,
	Ian Jackson, George Dunlap, xen-devel, Roger Pau Monné

On 27.07.2020 16:21, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Rewrite those functions to use the new APIs. Modify its callers to unmap
> the pointer returned. Since alloc_xen_pagetable_new() is almost never
> useful unless accompanied by page clearing and a mapping, introduce a
> helper alloc_map_clear_xen_pt() for this sequence.
> 
> Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
> unmap the page, which requires domain_page.h header in vmap.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> ---
> Changed in v8:
> - s/virtual address/linear address/.
> - BUG_ON() on NULL return in vmap_to_mfn().

The justification for this should be recorded in the description. In
reply to v7 I did even suggest how to easily address the issue you
did notice with large pages, as well as alternative behavior for
vmap_to_mfn().

> --- a/xen/include/asm-x86/page.h
> +++ b/xen/include/asm-x86/page.h
> @@ -291,7 +291,15 @@ void copy_page_sse2(void *, const void *);
>  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
>  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
>  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> -#define vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned long)(va))))
> +
> +#define vmap_to_mfn(va) ({                                                  \
> +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));   \
> +        mfn_t mfn_;                                                         \
> +        BUG_ON(!pl1e_);                                                     \
> +        mfn_ = l1e_get_mfn(*pl1e_);                                         \
> +        unmap_domain_page(pl1e_);                                           \
> +        mfn_; })

Additionally - no idea why I only notice this now, this wants some
further formatting adjustment: Either

#define vmap_to_mfn(va) ({                                                \
        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va)); \
        mfn_t mfn_;                                                       \
        BUG_ON(!pl1e_);                                                   \
        mfn_ = l1e_get_mfn(*pl1e_);                                       \
        unmap_domain_page(pl1e_);                                         \
        mfn_;                                                             \
    })

or (preferably imo)

#define vmap_to_mfn(va) ({                                            \
    const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va)); \
    mfn_t mfn_;                                                       \
    BUG_ON(!pl1e_);                                                   \
    mfn_ = l1e_get_mfn(*pl1e_);                                       \
    unmap_domain_page(pl1e_);                                         \
    mfn_;                                                             \
})

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 07/15] x86_64/mm: switch to new APIs in paging_init
  2020-07-27 14:21 ` [PATCH v8 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
@ 2020-08-07 14:09   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2020-08-07 14:09 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, jgrall, Roger Pau Monné, Wei Liu, Andrew Cooper

On 27.07.2020 16:21, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Map and unmap pages instead of relying on the direct map.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> ---
> Changed in v8:
> - replace l3/2_ro_mpt_mfn with just mfn since their lifetimes do not
>   overlap

Good, but ...

> --- a/xen/arch/x86/x86_64/mm.c
> +++ b/xen/arch/x86/x86_64/mm.c
> @@ -481,6 +481,7 @@ void __init paging_init(void)
>      l3_pgentry_t *l3_ro_mpt;
>      l2_pgentry_t *pl2e = NULL, *l2_ro_mpt = NULL;
>      struct page_info *l1_pg;
> +    mfn_t mfn;
>  
>      /*
>       * We setup the L3s for 1:1 mapping if host support memory hotplug
> @@ -493,22 +494,23 @@ void __init paging_init(void)
>          if ( !(l4e_get_flags(idle_pg_table[l4_table_offset(va)]) &
>                _PAGE_PRESENT) )
>          {
> -            l3_pgentry_t *pl3t = alloc_xen_pagetable();
> +            mfn_t l3mfn;

... what about this one? It's again only used ...

> +            l3_pgentry_t *pl3t = alloc_map_clear_xen_pt(&l3mfn);
>  
>              if ( !pl3t )
>                  goto nomem;
> -            clear_page(pl3t);
> +            UNMAP_DOMAIN_PAGE(pl3t);
>              l4e_write(&idle_pg_table[l4_table_offset(va)],
> -                      l4e_from_paddr(__pa(pl3t), __PAGE_HYPERVISOR_RW));
> +                      l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR_RW));
>          }
>      }
>  
>      /* Create user-accessible L2 directory to map the MPT for guests. */
> -    if ( (l3_ro_mpt = alloc_xen_pagetable()) == NULL )
> +    l3_ro_mpt = alloc_map_clear_xen_pt(&mfn);

... without colliding with this first use of mfn.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 09/15] efi: use new page table APIs in copy_mapping
  2020-07-27 14:21 ` [PATCH v8 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
@ 2020-08-07 14:13   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2020-08-07 14:13 UTC (permalink / raw)
  To: Hongyan Xia; +Cc: xen-devel, jgrall

On 27.07.2020 16:21, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-07 14:05   ` Jan Beulich
@ 2020-08-13 16:08     ` Hongyan Xia
  2020-08-13 17:22       ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Hongyan Xia @ 2020-08-13 16:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Julien Grall, Stefano Stabellini

On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
> On 27.07.2020 16:21, Hongyan Xia wrote:
> > From: Wei Liu <wei.liu2@citrix.com>
> > 
> > Rewrite those functions to use the new APIs. Modify its callers to
> > unmap
> > the pointer returned. Since alloc_xen_pagetable_new() is almost
> > never
> > useful unless accompanied by page clearing and a mapping, introduce
> > a
> > helper alloc_map_clear_xen_pt() for this sequence.
> > 
> > Note that the change of virt_to_xen_l1e() also requires
> > vmap_to_mfn() to
> > unmap the page, which requires domain_page.h header in vmap.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > 
> > ---
> > Changed in v8:
> > - s/virtual address/linear address/.
> > - BUG_ON() on NULL return in vmap_to_mfn().
> 
> The justification for this should be recorded in the description. In

Will do.

> reply to v7 I did even suggest how to easily address the issue you
> did notice with large pages, as well as alternative behavior for
> vmap_to_mfn().

One thing about adding SMALL_PAGES is that vmap is common code and I am
not sure if the Arm side is happy with it.

> 
> > --- a/xen/include/asm-x86/page.h
> > +++ b/xen/include/asm-x86/page.h
> > @@ -291,7 +291,15 @@ void copy_page_sse2(void *, const void *);
> >  #define pfn_to_paddr(pfn)   __pfn_to_paddr(pfn)
> >  #define paddr_to_pfn(pa)    __paddr_to_pfn(pa)
> >  #define paddr_to_pdx(pa)    pfn_to_pdx(paddr_to_pfn(pa))
> > -#define
> > vmap_to_mfn(va)     _mfn(l1e_get_pfn(*virt_to_xen_l1e((unsigned
> > long)(va))))
> > +
> > +#define vmap_to_mfn(va)
> > ({                                                  \
> > +        const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
> > long)(va));   \
> > +        mfn_t
> > mfn_;                                                         \
> > +        BUG_ON(!pl1e_);                                           
> >           \
> > +        mfn_ =
> > l1e_get_mfn(*pl1e_);                                         \
> > +        unmap_domain_page(pl1e_);                                 
> >           \
> > +        mfn_; })
> 
> Additionally - no idea why I only notice this now, this wants some
> further formatting adjustment: Either
> 
> #define vmap_to_mfn(va)
> ({                                                \
>         const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned
> long)(va)); \
>         mfn_t
> mfn_;                                                       \
>         BUG_ON(!pl1e_);                                              
>      \
>         mfn_ =
> l1e_get_mfn(*pl1e_);                                       \
>         unmap_domain_page(pl1e_);                                    
>      \
>         mfn_;                                                        
>      \
>     })
> 
> or (preferably imo)
> 
> #define vmap_to_mfn(va)
> ({                                            \
>     const l1_pgentry_t *pl1e_ = virt_to_xen_l1e((unsigned long)(va));
> \
>     mfn_t
> mfn_;                                                       \
>     BUG_ON(!pl1e_);                                                  
>  \
>     mfn_ =
> l1e_get_mfn(*pl1e_);                                       \
>     unmap_domain_page(pl1e_);                                        
>  \
>     mfn_;                                                            
>  \
> })

Will do so in the next rev.

Hongyan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-13 16:08     ` Hongyan Xia
@ 2020-08-13 17:22       ` Julien Grall
  2020-08-18  8:49         ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2020-08-13 17:22 UTC (permalink / raw)
  To: Hongyan Xia, Jan Beulich
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

Hi,

On 13/08/2020 17:08, Hongyan Xia wrote:
> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>> From: Wei Liu <wei.liu2@citrix.com>
>>>
>>> Rewrite those functions to use the new APIs. Modify its callers to
>>> unmap
>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>> never
>>> useful unless accompanied by page clearing and a mapping, introduce
>>> a
>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>
>>> Note that the change of virt_to_xen_l1e() also requires
>>> vmap_to_mfn() to
>>> unmap the page, which requires domain_page.h header in vmap.
>>>
>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> ---
>>> Changed in v8:
>>> - s/virtual address/linear address/.
>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>
>> The justification for this should be recorded in the description. In
> 
> Will do.
> 
>> reply to v7 I did even suggest how to easily address the issue you
>> did notice with large pages, as well as alternative behavior for
>> vmap_to_mfn().
> 
> One thing about adding SMALL_PAGES is that vmap is common code and I am
> not sure if the Arm side is happy with it.

At the moment, Arm is only using small mapping but I plan to change that 
soon because we have regions that can be fairly big.

Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So 
I don't particularly like the idea to expose such trick in common code.

Even on x86, I think this is not the right approach. Such band-aid will 
impact the performance as, assuming superpages are used, it will take 
longer to map and add pressure on the TLBs.

I am aware that superpages will be useful for LiveUpdate, but is there 
any use cases in upstream? If not, could we just use the BUG_ON() and 
implement correctly vmap_to_mfn() in a follow-up?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-13 17:22       ` Julien Grall
@ 2020-08-18  8:49         ` Jan Beulich
  2020-08-18 10:13           ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2020-08-18  8:49 UTC (permalink / raw)
  To: Julien Grall, Hongyan Xia
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

On 13.08.2020 19:22, Julien Grall wrote:
> Hi,
> 
> On 13/08/2020 17:08, Hongyan Xia wrote:
>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>
>>>> Rewrite those functions to use the new APIs. Modify its callers to
>>>> unmap
>>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>>> never
>>>> useful unless accompanied by page clearing and a mapping, introduce
>>>> a
>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>
>>>> Note that the change of virt_to_xen_l1e() also requires
>>>> vmap_to_mfn() to
>>>> unmap the page, which requires domain_page.h header in vmap.
>>>>
>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> ---
>>>> Changed in v8:
>>>> - s/virtual address/linear address/.
>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>
>>> The justification for this should be recorded in the description. In
>>
>> Will do.
>>
>>> reply to v7 I did even suggest how to easily address the issue you
>>> did notice with large pages, as well as alternative behavior for
>>> vmap_to_mfn().
>>
>> One thing about adding SMALL_PAGES is that vmap is common code and I am
>> not sure if the Arm side is happy with it.
> 
> At the moment, Arm is only using small mapping but I plan to change that soon because we have regions that can be fairly big.
> 
> Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So I don't particularly like the idea to expose such trick in common code.
> 
> Even on x86, I think this is not the right approach. Such band-aid will impact the performance as, assuming superpages are used, it will take longer to map and add pressure on the TLBs.
> 
> I am aware that superpages will be useful for LiveUpdate, but is there any use cases in upstream?

Superpage use by vmalloc() is purely occasional: You'd have to vmalloc()
2Mb or more _and_ the page-wise allocation ought to return 512
consecutive pages in the right order. Getting 512 consecutive pages is
possible in practice, but with the page allocator allocating top-down it
is very unlikely for them to be returned in increasing-sorted order.

> If not, could we just use the BUG_ON() and implement correctly vmap_to_mfn() in a follow-up?

My main concern with the BUG_ON() is that it detects a problem long after
it was introduced (when the mapping was established). I'd rather see a
BUG_ON() added there if use of MAP_SMALL_PAGES is deemed unsuitable.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-18  8:49         ` Jan Beulich
@ 2020-08-18 10:13           ` Julien Grall
  2020-08-18 11:30             ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2020-08-18 10:13 UTC (permalink / raw)
  To: Jan Beulich, Hongyan Xia
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

Hi Jan,

On 18/08/2020 09:49, Jan Beulich wrote:
> On 13.08.2020 19:22, Julien Grall wrote:
>> Hi,
>>
>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>
>>>>> Rewrite those functions to use the new APIs. Modify its callers to
>>>>> unmap
>>>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>>>> never
>>>>> useful unless accompanied by page clearing and a mapping, introduce
>>>>> a
>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>
>>>>> Note that the change of virt_to_xen_l1e() also requires
>>>>> vmap_to_mfn() to
>>>>> unmap the page, which requires domain_page.h header in vmap.
>>>>>
>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> ---
>>>>> Changed in v8:
>>>>> - s/virtual address/linear address/.
>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>
>>>> The justification for this should be recorded in the description. In
>>>
>>> Will do.
>>>
>>>> reply to v7 I did even suggest how to easily address the issue you
>>>> did notice with large pages, as well as alternative behavior for
>>>> vmap_to_mfn().
>>>
>>> One thing about adding SMALL_PAGES is that vmap is common code and I am
>>> not sure if the Arm side is happy with it.
>>
>> At the moment, Arm is only using small mapping but I plan to change that soon because we have regions that can be fairly big.
>>
>> Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So I don't particularly like the idea to expose such trick in common code.
>>
>> Even on x86, I think this is not the right approach. Such band-aid will impact the performance as, assuming superpages are used, it will take longer to map and add pressure on the TLBs.
>>
>> I am aware that superpages will be useful for LiveUpdate, but is there any use cases in upstream?
> 
> Superpage use by vmalloc() is purely occasional: You'd have to vmalloc()
> 2Mb or more _and_ the page-wise allocation ought to return 512
> consecutive pages in the right order. Getting 512 consecutive pages is
> possible in practice, but with the page allocator allocating top-down it
> is very unlikely for them to be returned in increasing-sorted order.
So your assumption here is vmap_to_mfn() can only be called on 
vmalloc-ed() area. While this may be the case in Xen today, the name 
clearly suggest it can be called on all vmap-ed region.

> 
>> If not, could we just use the BUG_ON() and implement correctly vmap_to_mfn() in a follow-up?
> 
> My main concern with the BUG_ON() is that it detects a problem long after
> it was introduced (when the mapping was established). I'd rather see a
> BUG_ON() added there if use of MAP_SMALL_PAGES is deemed unsuitable.

 From what you wrote, I would agree that vmalloc() is unlikely going to 
use superpages. But this is not going to solve the underlying problem 
with the rest of the vmap area.

So are you suggesting to use MAP_SMALL_PAGES for *all* the vmap()?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-18 10:13           ` Julien Grall
@ 2020-08-18 11:30             ` Jan Beulich
  2020-08-18 13:08               ` Julien Grall
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2020-08-18 11:30 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, xen-devel, jgrall, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

On 18.08.2020 12:13, Julien Grall wrote:
> Hi Jan,
> 
> On 18/08/2020 09:49, Jan Beulich wrote:
>> On 13.08.2020 19:22, Julien Grall wrote:
>>> Hi,
>>>
>>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>>
>>>>>> Rewrite those functions to use the new APIs. Modify its callers to
>>>>>> unmap
>>>>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>>>>> never
>>>>>> useful unless accompanied by page clearing and a mapping, introduce
>>>>>> a
>>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>>
>>>>>> Note that the change of virt_to_xen_l1e() also requires
>>>>>> vmap_to_mfn() to
>>>>>> unmap the page, which requires domain_page.h header in vmap.
>>>>>>
>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>>
>>>>>> ---
>>>>>> Changed in v8:
>>>>>> - s/virtual address/linear address/.
>>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>>
>>>>> The justification for this should be recorded in the description. In
>>>>
>>>> Will do.
>>>>
>>>>> reply to v7 I did even suggest how to easily address the issue you
>>>>> did notice with large pages, as well as alternative behavior for
>>>>> vmap_to_mfn().
>>>>
>>>> One thing about adding SMALL_PAGES is that vmap is common code and I am
>>>> not sure if the Arm side is happy with it.
>>>
>>> At the moment, Arm is only using small mapping but I plan to change that soon because we have regions that can be fairly big.
>>>
>>> Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So I don't particularly like the idea to expose such trick in common code.
>>>
>>> Even on x86, I think this is not the right approach. Such band-aid will impact the performance as, assuming superpages are used, it will take longer to map and add pressure on the TLBs.
>>>
>>> I am aware that superpages will be useful for LiveUpdate, but is there any use cases in upstream?
>>
>> Superpage use by vmalloc() is purely occasional: You'd have to vmalloc()
>> 2Mb or more _and_ the page-wise allocation ought to return 512
>> consecutive pages in the right order. Getting 512 consecutive pages is
>> possible in practice, but with the page allocator allocating top-down it
>> is very unlikely for them to be returned in increasing-sorted order.
> So your assumption here is vmap_to_mfn() can only be called on vmalloc-ed() area. While this may be the case in Xen today, the name clearly suggest it can be called on all vmap-ed region.

No, I don't make this assumption, and I did spell this out in an earlier
reply to Hongyan: Parties using vmap() on a sufficiently large address
range with consecutive MFNs simply have to be aware that they may not
call vmap_to_mfn(). And why would they? They had the MFNs in their hands
at the time of mapping, so no need to (re-)obtain them by looking up the
translation.

>>> If not, could we just use the BUG_ON() and implement correctly vmap_to_mfn() in a follow-up?
>>
>> My main concern with the BUG_ON() is that it detects a problem long after
>> it was introduced (when the mapping was established). I'd rather see a
>> BUG_ON() added there if use of MAP_SMALL_PAGES is deemed unsuitable.
> 
> From what you wrote, I would agree that vmalloc() is unlikely going to use superpages. But this is not going to solve the underlying problem with the rest of the vmap area.
> 
> So are you suggesting to use MAP_SMALL_PAGES for *all* the vmap()?

As per above - no.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-18 11:30             ` Jan Beulich
@ 2020-08-18 13:08               ` Julien Grall
  2020-08-18 16:16                 ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Julien Grall @ 2020-08-18 13:08 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Hongyan Xia, xen-devel, jgrall, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

Hi Jan,

On 18/08/2020 12:30, Jan Beulich wrote:
> On 18.08.2020 12:13, Julien Grall wrote:
>> Hi Jan,
>>
>> On 18/08/2020 09:49, Jan Beulich wrote:
>>> On 13.08.2020 19:22, Julien Grall wrote:
>>>> Hi,
>>>>
>>>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>>>
>>>>>>> Rewrite those functions to use the new APIs. Modify its callers to
>>>>>>> unmap
>>>>>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>>>>>> never
>>>>>>> useful unless accompanied by page clearing and a mapping, introduce
>>>>>>> a
>>>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>>>
>>>>>>> Note that the change of virt_to_xen_l1e() also requires
>>>>>>> vmap_to_mfn() to
>>>>>>> unmap the page, which requires domain_page.h header in vmap.
>>>>>>>
>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>
>>>>>>> ---
>>>>>>> Changed in v8:
>>>>>>> - s/virtual address/linear address/.
>>>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>>>
>>>>>> The justification for this should be recorded in the description. In
>>>>>
>>>>> Will do.
>>>>>
>>>>>> reply to v7 I did even suggest how to easily address the issue you
>>>>>> did notice with large pages, as well as alternative behavior for
>>>>>> vmap_to_mfn().
>>>>>
>>>>> One thing about adding SMALL_PAGES is that vmap is common code and I am
>>>>> not sure if the Arm side is happy with it.
>>>>
>>>> At the moment, Arm is only using small mapping but I plan to change that soon because we have regions that can be fairly big.
>>>>
>>>> Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So I don't particularly like the idea to expose such trick in common code.
>>>>
>>>> Even on x86, I think this is not the right approach. Such band-aid will impact the performance as, assuming superpages are used, it will take longer to map and add pressure on the TLBs.
>>>>
>>>> I am aware that superpages will be useful for LiveUpdate, but is there any use cases in upstream?
>>>
>>> Superpage use by vmalloc() is purely occasional: You'd have to vmalloc()
>>> 2Mb or more _and_ the page-wise allocation ought to return 512
>>> consecutive pages in the right order. Getting 512 consecutive pages is
>>> possible in practice, but with the page allocator allocating top-down it
>>> is very unlikely for them to be returned in increasing-sorted order.
>> So your assumption here is vmap_to_mfn() can only be called on vmalloc-ed() area. While this may be the case in Xen today, the name clearly suggest it can be called on all vmap-ed region.
> 
> No, I don't make this assumption, and I did spell this out in an earlier
> reply to Hongyan: Parties using vmap() on a sufficiently large address
> range with consecutive MFNs simply have to be aware that they may not
> call vmap_to_mfn().

You make it sounds easy to be aware, however there are two 
implementations of vmap_to_mfn() (one per arch). Even looking at the x86 
version, it is not obvious there is a restriction.

So I am a bit concerned of the "misuse" of the function. This could 
possibly be documented. Although, I am not entirely happy to restrict 
the use because of x86.

> And why would they? They had the MFNs in their hands
> at the time of mapping, so no need to (re-)obtain them by looking up the
> translation.

It may not always be convenient to keep the MFN in hand for the duration 
of the mapping.

An example was discussed in [1]. Roughly, the code would look like:

acpi_os_alloc_memory(...)
{
      mfn = alloc_boot_pages(...);
      vmap(mfn, ...);
}


acpi_os_free_memory(...)
{
     mfn = vmap_to_mfn(...);
     vunmap(...);

     init_boot_pages(mfn, ...);
}

>>>> If not, could we just use the BUG_ON() and implement correctly vmap_to_mfn() in a follow-up?
>>>
>>> My main concern with the BUG_ON() is that it detects a problem long after
>>> it was introduced (when the mapping was established). I'd rather see a
>>> BUG_ON() added there if use of MAP_SMALL_PAGES is deemed unsuitable.
>>
>>  From what you wrote, I would agree that vmalloc() is unlikely going to use superpages. But this is not going to solve the underlying problem with the rest of the vmap area.
>>
>> So are you suggesting to use MAP_SMALL_PAGES for *all* the vmap()?
> 
> As per above - no.
> 
> Jan
> 

[1] 
<a71d1903267b84afdb0e54fa2ac55540ab2f9357.1588278317.git.hongyxia@amazon.com>

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-18 13:08               ` Julien Grall
@ 2020-08-18 16:16                 ` Jan Beulich
  2020-11-30 12:13                   ` Hongyan Xia
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2020-08-18 16:16 UTC (permalink / raw)
  To: Julien Grall
  Cc: Hongyan Xia, xen-devel, jgrall, Andrew Cooper, Wei Liu,
	Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

On 18.08.2020 15:08, Julien Grall wrote:
> Hi Jan,
> 
> On 18/08/2020 12:30, Jan Beulich wrote:
>> On 18.08.2020 12:13, Julien Grall wrote:
>>> Hi Jan,
>>>
>>> On 18/08/2020 09:49, Jan Beulich wrote:
>>>> On 13.08.2020 19:22, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>>>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>
>>>>>>>> Rewrite those functions to use the new APIs. Modify its callers to
>>>>>>>> unmap
>>>>>>>> the pointer returned. Since alloc_xen_pagetable_new() is almost
>>>>>>>> never
>>>>>>>> useful unless accompanied by page clearing and a mapping, introduce
>>>>>>>> a
>>>>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>>>>
>>>>>>>> Note that the change of virt_to_xen_l1e() also requires
>>>>>>>> vmap_to_mfn() to
>>>>>>>> unmap the page, which requires domain_page.h header in vmap.
>>>>>>>>
>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Changed in v8:
>>>>>>>> - s/virtual address/linear address/.
>>>>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>>>>
>>>>>>> The justification for this should be recorded in the description. In
>>>>>>
>>>>>> Will do.
>>>>>>
>>>>>>> reply to v7 I did even suggest how to easily address the issue you
>>>>>>> did notice with large pages, as well as alternative behavior for
>>>>>>> vmap_to_mfn().
>>>>>>
>>>>>> One thing about adding SMALL_PAGES is that vmap is common code and I am
>>>>>> not sure if the Arm side is happy with it.
>>>>>
>>>>> At the moment, Arm is only using small mapping but I plan to change that soon because we have regions that can be fairly big.
>>>>>
>>>>> Regardless that, the issue with vmap_to_mfn() is rather x86 specific. So I don't particularly like the idea to expose such trick in common code.
>>>>>
>>>>> Even on x86, I think this is not the right approach. Such band-aid will impact the performance as, assuming superpages are used, it will take longer to map and add pressure on the TLBs.
>>>>>
>>>>> I am aware that superpages will be useful for LiveUpdate, but is there any use cases in upstream?
>>>>
>>>> Superpage use by vmalloc() is purely occasional: You'd have to vmalloc()
>>>> 2Mb or more _and_ the page-wise allocation ought to return 512
>>>> consecutive pages in the right order. Getting 512 consecutive pages is
>>>> possible in practice, but with the page allocator allocating top-down it
>>>> is very unlikely for them to be returned in increasing-sorted order.
>>> So your assumption here is vmap_to_mfn() can only be called on vmalloc-ed() area. While this may be the case in Xen today, the name clearly suggest it can be called on all vmap-ed region.
>>
>> No, I don't make this assumption, and I did spell this out in an earlier
>> reply to Hongyan: Parties using vmap() on a sufficiently large address
>> range with consecutive MFNs simply have to be aware that they may not
>> call vmap_to_mfn().
> 
> You make it sounds easy to be aware, however there are two implementations of vmap_to_mfn() (one per arch). Even looking at the x86 version, it is not obvious there is a restriction.

I didn't mean to make it sound like this - I agree it's not an obvious
restriction.

> So I am a bit concerned of the "misuse" of the function. This could possibly be documented. Although, I am not entirely happy to restrict the use because of x86.

Unless the underlying issue gets fixed, we need _some_ form of bodge.
I'm not really happy with the BUG_ON() as proposed by Hongyan. You're
not really happy with my first proposed alternative, and you didn't
comment on the 2nd one (still kept in context below). Not sure what
to do: Throw dice?

>> And why would they? They had the MFNs in their hands
>> at the time of mapping, so no need to (re-)obtain them by looking up the
>> translation.
> 
> It may not always be convenient to keep the MFN in hand for the duration of the mapping.
> 
> An example was discussed in [1]. Roughly, the code would look like:
> 
> acpi_os_alloc_memory(...)
> {
>      mfn = alloc_boot_pages(...);
>      vmap(mfn, ...);
> }
> 
> 
> acpi_os_free_memory(...)
> {
>     mfn = vmap_to_mfn(...);
>     vunmap(...);
> 
>     init_boot_pages(mfn, ...);
> }

To a certain degree I'd consider this an abuse of the interface, but
yes, I see your point (and I was aware that there are possible cases
where keeping the MFN(s) in hand may be undesirable). Then again
there's very little use of vmap_to_mfn() in the first place, so I
didn't think it was very likely that a problematic case appeared
until the proper fixing of the issue.

Jan

>>>>> If not, could we just use the BUG_ON() and implement correctly vmap_to_mfn() in a follow-up?
>>>>
>>>> My main concern with the BUG_ON() is that it detects a problem long after
>>>> it was introduced (when the mapping was established). I'd rather see a
>>>> BUG_ON() added there if use of MAP_SMALL_PAGES is deemed unsuitable.
>>>
>>>  From what you wrote, I would agree that vmalloc() is unlikely going to use superpages. But this is not going to solve the underlying problem with the rest of the vmap area.
>>>
>>> So are you suggesting to use MAP_SMALL_PAGES for *all* the vmap()?
>>
>> As per above - no.
>>
>> Jan
>>
> 
> [1] <a71d1903267b84afdb0e54fa2ac55540ab2f9357.1588278317.git.hongyxia@amazon.com>
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-08-18 16:16                 ` Jan Beulich
@ 2020-11-30 12:13                   ` Hongyan Xia
  2020-11-30 12:50                     ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Hongyan Xia @ 2020-11-30 12:13 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

Sorry for the late reply. Been busy with something else.

On Tue, 2020-08-18 at 18:16 +0200, Jan Beulich wrote:
> On 18.08.2020 15:08, Julien Grall wrote:
> > Hi Jan,
> > 
> > On 18/08/2020 12:30, Jan Beulich wrote:
> > > On 18.08.2020 12:13, Julien Grall wrote:
> > > > Hi Jan,
> > > > 
> > > > On 18/08/2020 09:49, Jan Beulich wrote:
> > > > > On 13.08.2020 19:22, Julien Grall wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On 13/08/2020 17:08, Hongyan Xia wrote:
> > > > > > > On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
> > > > > > > > On 27.07.2020 16:21, Hongyan Xia wrote:
> > > > > > > > > From: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > > 
> > > > > > > > > Rewrite those functions to use the new APIs. Modify
> > > > > > > > > its callers to
> > > > > > > > > unmap
> > > > > > > > > the pointer returned. Since alloc_xen_pagetable_new()
> > > > > > > > > is almost
> > > > > > > > > never
> > > > > > > > > useful unless accompanied by page clearing and a
> > > > > > > > > mapping, introduce
> > > > > > > > > a
> > > > > > > > > helper alloc_map_clear_xen_pt() for this sequence.
> > > > > > > > > 
> > > > > > > > > Note that the change of virt_to_xen_l1e() also
> > > > > > > > > requires
> > > > > > > > > vmap_to_mfn() to
> > > > > > > > > unmap the page, which requires domain_page.h header
> > > > > > > > > in vmap.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > > Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> > > > > > > > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > > > > > > > > 
> > > > > > > > > ---
> > > > > > > > > Changed in v8:
> > > > > > > > > - s/virtual address/linear address/.
> > > > > > > > > - BUG_ON() on NULL return in vmap_to_mfn().
> > > > > > > > 
> > > > > > > > The justification for this should be recorded in the
> > > > > > > > description. In
> > > > > > > 
> > > > > > > Will do.
> > > > > > > 
> > > > > > > > reply to v7 I did even suggest how to easily address
> > > > > > > > the issue you
> > > > > > > > did notice with large pages, as well as alternative
> > > > > > > > behavior for
> > > > > > > > vmap_to_mfn().
> > > > > > > 
> > > > > > > One thing about adding SMALL_PAGES is that vmap is common
> > > > > > > code and I am
> > > > > > > not sure if the Arm side is happy with it.
> > > > > > 
> > > > > > At the moment, Arm is only using small mapping but I plan
> > > > > > to change that soon because we have regions that can be
> > > > > > fairly big.
> > > > > > 
> > > > > > Regardless that, the issue with vmap_to_mfn() is rather x86
> > > > > > specific. So I don't particularly like the idea to expose
> > > > > > such trick in common code.
> > > > > > 
> > > > > > Even on x86, I think this is not the right approach. Such
> > > > > > band-aid will impact the performance as, assuming
> > > > > > superpages are used, it will take longer to map and add
> > > > > > pressure on the TLBs.
> > > > > > 
> > > > > > I am aware that superpages will be useful for LiveUpdate,
> > > > > > but is there any use cases in upstream?
> > > > > 
> > > > > Superpage use by vmalloc() is purely occasional: You'd have
> > > > > to vmalloc()
> > > > > 2Mb or more _and_ the page-wise allocation ought to return
> > > > > 512
> > > > > consecutive pages in the right order. Getting 512 consecutive
> > > > > pages is
> > > > > possible in practice, but with the page allocator allocating
> > > > > top-down it
> > > > > is very unlikely for them to be returned in increasing-sorted 
> > > > > order.
> > > > 
> > > > So your assumption here is vmap_to_mfn() can only be called on
> > > > vmalloc-ed() area. While this may be the case in Xen today, the
> > > > name clearly suggest it can be called on all vmap-ed region.
> > > 
> > > No, I don't make this assumption, and I did spell this out in an
> > > earlier
> > > reply to Hongyan: Parties using vmap() on a sufficiently large
> > > address
> > > range with consecutive MFNs simply have to be aware that they may
> > > not
> > > call vmap_to_mfn().
> > 
> > You make it sounds easy to be aware, however there are two
> > implementations of vmap_to_mfn() (one per arch). Even looking at
> > the x86 version, it is not obvious there is a restriction.
> 
> I didn't mean to make it sound like this - I agree it's not an
> obvious
> restriction.
> 
> > So I am a bit concerned of the "misuse" of the function. This could
> > possibly be documented. Although, I am not entirely happy to
> > restrict the use because of x86.
> 
> Unless the underlying issue gets fixed, we need _some_ form of bodge.
> I'm not really happy with the BUG_ON() as proposed by Hongyan. You're
> not really happy with my first proposed alternative, and you didn't
> comment on the 2nd one (still kept in context below). Not sure what
> to do: Throw dice?

Actually I did not propose the BUG_ON() fix. I was just in favor of it
when Jan presented it as a choice in v7.

The reason I am in favor of it is that even without it, the existing
x86 code already BUG_ON() when vmap has a superpage anyway, so it's not
like other alternatives behave any differently for superpages. I am
also not sure about returning INVALID_MFN because if virt_to_xen_l1e()
really returns NULL, then we are calling vmap_to_mfn() on a non-vmap
address (not even populated) which frankly deserves at least ASSERT().
So, I don't think BUG_ON() is a bad idea for now before vmap_to_mfn()
supports superpages.

Of course, we could use MAP_SMALL_PAGES to avoid the whole problem, but
Arm may not be happy. After a quick chat with Julien, how about having
ARCH_VMAP_FLAGS and only small pages for x86?

Hongyan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-11-30 12:13                   ` Hongyan Xia
@ 2020-11-30 12:50                     ` Jan Beulich
  2020-11-30 14:13                       ` Hongyan Xia
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2020-11-30 12:50 UTC (permalink / raw)
  To: Hongyan Xia, Julien Grall
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

On 30.11.2020 13:13, Hongyan Xia wrote:
> On Tue, 2020-08-18 at 18:16 +0200, Jan Beulich wrote:
>> On 18.08.2020 15:08, Julien Grall wrote:
>>> On 18/08/2020 12:30, Jan Beulich wrote:
>>>> On 18.08.2020 12:13, Julien Grall wrote:
>>>>> On 18/08/2020 09:49, Jan Beulich wrote:
>>>>>> On 13.08.2020 19:22, Julien Grall wrote:
>>>>>>> On 13/08/2020 17:08, Hongyan Xia wrote:
>>>>>>>> On Fri, 2020-08-07 at 16:05 +0200, Jan Beulich wrote:
>>>>>>>>> On 27.07.2020 16:21, Hongyan Xia wrote:
>>>>>>>>>> From: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>>>
>>>>>>>>>> Rewrite those functions to use the new APIs. Modify
>>>>>>>>>> its callers to
>>>>>>>>>> unmap
>>>>>>>>>> the pointer returned. Since alloc_xen_pagetable_new()
>>>>>>>>>> is almost
>>>>>>>>>> never
>>>>>>>>>> useful unless accompanied by page clearing and a
>>>>>>>>>> mapping, introduce
>>>>>>>>>> a
>>>>>>>>>> helper alloc_map_clear_xen_pt() for this sequence.
>>>>>>>>>>
>>>>>>>>>> Note that the change of virt_to_xen_l1e() also
>>>>>>>>>> requires
>>>>>>>>>> vmap_to_mfn() to
>>>>>>>>>> unmap the page, which requires domain_page.h header
>>>>>>>>>> in vmap.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>>> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
>>>>>>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> Changed in v8:
>>>>>>>>>> - s/virtual address/linear address/.
>>>>>>>>>> - BUG_ON() on NULL return in vmap_to_mfn().
>>>>>>>>>
>>>>>>>>> The justification for this should be recorded in the
>>>>>>>>> description. In
>>>>>>>>
>>>>>>>> Will do.
>>>>>>>>
>>>>>>>>> reply to v7 I did even suggest how to easily address
>>>>>>>>> the issue you
>>>>>>>>> did notice with large pages, as well as alternative
>>>>>>>>> behavior for
>>>>>>>>> vmap_to_mfn().
>>>>>>>>
>>>>>>>> One thing about adding SMALL_PAGES is that vmap is common
>>>>>>>> code and I am
>>>>>>>> not sure if the Arm side is happy with it.
>>>>>>>
>>>>>>> At the moment, Arm is only using small mapping but I plan
>>>>>>> to change that soon because we have regions that can be
>>>>>>> fairly big.
>>>>>>>
>>>>>>> Regardless that, the issue with vmap_to_mfn() is rather x86
>>>>>>> specific. So I don't particularly like the idea to expose
>>>>>>> such trick in common code.
>>>>>>>
>>>>>>> Even on x86, I think this is not the right approach. Such
>>>>>>> band-aid will impact the performance as, assuming
>>>>>>> superpages are used, it will take longer to map and add
>>>>>>> pressure on the TLBs.
>>>>>>>
>>>>>>> I am aware that superpages will be useful for LiveUpdate,
>>>>>>> but is there any use cases in upstream?
>>>>>>
>>>>>> Superpage use by vmalloc() is purely occasional: You'd have
>>>>>> to vmalloc()
>>>>>> 2Mb or more _and_ the page-wise allocation ought to return
>>>>>> 512
>>>>>> consecutive pages in the right order. Getting 512 consecutive
>>>>>> pages is
>>>>>> possible in practice, but with the page allocator allocating
>>>>>> top-down it
>>>>>> is very unlikely for them to be returned in increasing-sorted 
>>>>>> order.
>>>>>
>>>>> So your assumption here is vmap_to_mfn() can only be called on
>>>>> vmalloc-ed() area. While this may be the case in Xen today, the
>>>>> name clearly suggest it can be called on all vmap-ed region.
>>>>
>>>> No, I don't make this assumption, and I did spell this out in an
>>>> earlier
>>>> reply to Hongyan: Parties using vmap() on a sufficiently large
>>>> address
>>>> range with consecutive MFNs simply have to be aware that they may
>>>> not
>>>> call vmap_to_mfn().
>>>
>>> You make it sounds easy to be aware, however there are two
>>> implementations of vmap_to_mfn() (one per arch). Even looking at
>>> the x86 version, it is not obvious there is a restriction.
>>
>> I didn't mean to make it sound like this - I agree it's not an
>> obvious
>> restriction.
>>
>>> So I am a bit concerned of the "misuse" of the function. This could
>>> possibly be documented. Although, I am not entirely happy to
>>> restrict the use because of x86.
>>
>> Unless the underlying issue gets fixed, we need _some_ form of bodge.
>> I'm not really happy with the BUG_ON() as proposed by Hongyan. You're
>> not really happy with my first proposed alternative, and you didn't
>> comment on the 2nd one (still kept in context below). Not sure what
>> to do: Throw dice?
> 
> Actually I did not propose the BUG_ON() fix. I was just in favor of it
> when Jan presented it as a choice in v7.
> 
> The reason I am in favor of it is that even without it, the existing
> x86 code already BUG_ON() when vmap has a superpage anyway, so it's not
> like other alternatives behave any differently for superpages. I am
> also not sure about returning INVALID_MFN because if virt_to_xen_l1e()
> really returns NULL, then we are calling vmap_to_mfn() on a non-vmap
> address (not even populated) which frankly deserves at least ASSERT().
> So, I don't think BUG_ON() is a bad idea for now before vmap_to_mfn()
> supports superpages.
> 
> Of course, we could use MAP_SMALL_PAGES to avoid the whole problem, but
> Arm may not be happy. After a quick chat with Julien, how about having
> ARCH_VMAP_FLAGS and only small pages for x86?

Possibly, albeit this will then make it look less like a bodge and
more like we would want to keep it this way. How difficult would it
be to actually make the thing work with superpages? Is it more than
simply
(pseudocode, potentially needed locking omitted):

vmap_to_mfn(va) {
    pl1e = virt_to_xen_l1e(va);
    if ( pl1e )
        return l1e_get_mfn(*pl1e);
    pl2e = virt_to_xen_l2e(va);
    if ( pl2e )
        return l2e_get_mfn(*pl2e) + suitable_bits(va);
    return l3e_get_mfn(*virt_to_xen_l3e(va)) + suitable_bits(va);
}

(assuming virt_to_xen_l<N>e() would be returning NULL in such a case)?
Not very efficient, but not needed anywhere anyway - the sole user of
the construct is domain_page_map_to_mfn(), which maps only individual
pages. (An even better option would be to avoid the recurring walk of
the higher levels by using only virt_to_xen_l3e() and then doing the
remaining steps "by hand". This would at once allow to avoid the here
unwanted / unneeded checking for whether page tables need allocating.)

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-11-30 12:50                     ` Jan Beulich
@ 2020-11-30 14:13                       ` Hongyan Xia
  2020-11-30 14:47                         ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Hongyan Xia @ 2020-11-30 14:13 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini

On Mon, 2020-11-30 at 13:50 +0100, Jan Beulich wrote:
> On 30.11.2020 13:13, Hongyan Xia wrote:
> > On Tue, 2020-08-18 at 18:16 +0200, Jan Beulich wrote:
> > [...]
> > 
> > Actually I did not propose the BUG_ON() fix. I was just in favor of
> > it
> > when Jan presented it as a choice in v7.
> > 
> > The reason I am in favor of it is that even without it, the
> > existing
> > x86 code already BUG_ON() when vmap has a superpage anyway, so it's
> > not
> > like other alternatives behave any differently for superpages. I am
> > also not sure about returning INVALID_MFN because if
> > virt_to_xen_l1e()
> > really returns NULL, then we are calling vmap_to_mfn() on a non-
> > vmap
> > address (not even populated) which frankly deserves at least
> > ASSERT().
> > So, I don't think BUG_ON() is a bad idea for now before
> > vmap_to_mfn()
> > supports superpages.
> > 
> > Of course, we could use MAP_SMALL_PAGES to avoid the whole problem,
> > but
> > Arm may not be happy. After a quick chat with Julien, how about
> > having
> > ARCH_VMAP_FLAGS and only small pages for x86?
> 
> Possibly, albeit this will then make it look less like a bodge and
> more like we would want to keep it this way. How difficult would it
> be to actually make the thing work with superpages? Is it more than
> simply
> (pseudocode, potentially needed locking omitted):
> 
> vmap_to_mfn(va) {
>     pl1e = virt_to_xen_l1e(va);
>     if ( pl1e )
>         return l1e_get_mfn(*pl1e);
>     pl2e = virt_to_xen_l2e(va);
>     if ( pl2e )
>         return l2e_get_mfn(*pl2e) + suitable_bits(va);
>     return l3e_get_mfn(*virt_to_xen_l3e(va)) + suitable_bits(va);
> }
> 
> (assuming virt_to_xen_l<N>e() would be returning NULL in such a
> case)?

The sad part is that instead of returning NULL, such functions BUG_ON()
when there is a superpage, hence why this solution was not considered.
Changing the logic from BUG_ON to returning NULL might not be
straightforward, since so far the callers assume NULL means -ENOMEM and
not anything else.

> Not very efficient, but not needed anywhere anyway - the sole user of
> the construct is domain_page_map_to_mfn(), which maps only individual
> pages. (An even better option would be to avoid the recurring walk of
> the higher levels by using only virt_to_xen_l3e() and then doing the
> remaining steps "by hand". This would at once allow to avoid the here
> unwanted / unneeded checking for whether page tables need
> allocating.)

The "even better option" looks more promising to me, and is what I want
to go forward with. At any rate, this fix grows larger than intended
and I want to send this as an individual patch. Any objections?

Hongyan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-11-30 14:13                       ` Hongyan Xia
@ 2020-11-30 14:47                         ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2020-11-30 14:47 UTC (permalink / raw)
  To: Hongyan Xia
  Cc: xen-devel, jgrall, Andrew Cooper, Wei Liu, Roger Pau Monné,
	George Dunlap, Ian Jackson, Stefano Stabellini, Julien Grall

On 30.11.2020 15:13, Hongyan Xia wrote:
> On Mon, 2020-11-30 at 13:50 +0100, Jan Beulich wrote:
>> Not very efficient, but not needed anywhere anyway - the sole user of
>> the construct is domain_page_map_to_mfn(), which maps only individual
>> pages. (An even better option would be to avoid the recurring walk of
>> the higher levels by using only virt_to_xen_l3e() and then doing the
>> remaining steps "by hand". This would at once allow to avoid the here
>> unwanted / unneeded checking for whether page tables need
>> allocating.)
> 
> The "even better option" looks more promising to me, and is what I want
> to go forward with. At any rate, this fix grows larger than intended
> and I want to send this as an individual patch. Any objections?

Definitely not - separate changes are almost always easier to look
at and faster to get in.

Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e
  2020-07-27 14:21 ` [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
  2020-08-07 14:05   ` Jan Beulich
@ 2020-12-07 15:28   ` Hongyan Xia
  1 sibling, 0 replies; 31+ messages in thread
From: Hongyan Xia @ 2020-12-07 15:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Julien Grall, Wei Liu, Andrew Cooper,
	Ian Jackson, George Dunlap, Jan Beulich, Roger Pau Monné

On Mon, 2020-07-27 at 15:21 +0100, Hongyan Xia wrote:
> From: Wei Liu <wei.liu2@citrix.com>
> 
> Rewrite those functions to use the new APIs. Modify its callers to
> unmap
> the pointer returned. Since alloc_xen_pagetable_new() is almost never
> useful unless accompanied by page clearing and a mapping, introduce a
> helper alloc_map_clear_xen_pt() for this sequence.
> 
> Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn()
> to
> unmap the page, which requires domain_page.h header in vmap.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

I believe the vmap part can be removed since x86 now handles
superpages.

> @@ -5085,8 +5117,8 @@ int map_pages_to_xen(
>      unsigned int flags)
>  {
>      bool locking = system_state > SYS_STATE_boot;
> -    l3_pgentry_t *pl3e, ol3e;
> -    l2_pgentry_t *pl2e, ol2e;
> +    l3_pgentry_t *pl3e = NULL, ol3e;
> +    l2_pgentry_t *pl2e = NULL, ol2e;
>      l1_pgentry_t *pl1e, ol1e;
>      unsigned int  i;
>      int rc = -ENOMEM;
> @@ -5107,6 +5139,10 @@ int map_pages_to_xen(
>  
>      while ( nr_mfns != 0 )
>      {
> +        /* Clean up mappings mapped in the previous iteration. */
> +        UNMAP_DOMAIN_PAGE(pl3e);
> +        UNMAP_DOMAIN_PAGE(pl2e);
> +
>          pl3e = virt_to_xen_l3e(virt);
>  
>          if ( !pl3e )

While rebasing, I found another issue. XSA-345 now locks the L3 table
by L3T_LOCK(virt_to_page(pl3e)) but for this series we cannot call
virt_to_page() here.

We could call domain_page_map_to_mfn() on pl3e to get back the mfn,
which should be fine since this function is rarely used outside boot so
the overhead should be low. We could probably pass an *mfn in as an
additional argument, but do we want to change this also for
virt_to_xen_l[21]e() to be consistent (although they don't need the
mfn)? I might also need to remove the R-b due to this non-trivial
change.

Thoughts?

Hongyan



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-12-07 15:28 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-27 14:21 [PATCH v8 00/15] switch to domheap for Xen page tables Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 01/15] x86/mm: map_pages_to_xen would better have one exit path Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 02/15] x86/mm: make sure there is one exit path for modify_xen_mappings Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 03/15] x86/mm: rewrite virt_to_xen_l*e Hongyan Xia
2020-08-07 14:05   ` Jan Beulich
2020-08-13 16:08     ` Hongyan Xia
2020-08-13 17:22       ` Julien Grall
2020-08-18  8:49         ` Jan Beulich
2020-08-18 10:13           ` Julien Grall
2020-08-18 11:30             ` Jan Beulich
2020-08-18 13:08               ` Julien Grall
2020-08-18 16:16                 ` Jan Beulich
2020-11-30 12:13                   ` Hongyan Xia
2020-11-30 12:50                     ` Jan Beulich
2020-11-30 14:13                       ` Hongyan Xia
2020-11-30 14:47                         ` Jan Beulich
2020-12-07 15:28   ` Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 04/15] x86/mm: switch to new APIs in map_pages_to_xen Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 05/15] x86/mm: switch to new APIs in modify_xen_mappings Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 06/15] x86_64/mm: introduce pl2e in paging_init Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 07/15] x86_64/mm: switch to new APIs " Hongyan Xia
2020-08-07 14:09   ` Jan Beulich
2020-07-27 14:21 ` [PATCH v8 08/15] x86_64/mm: switch to new APIs in setup_m2p_table Hongyan Xia
2020-07-27 14:21 ` [PATCH v8 09/15] efi: use new page table APIs in copy_mapping Hongyan Xia
2020-08-07 14:13   ` Jan Beulich
2020-07-27 14:22 ` [PATCH v8 10/15] efi: switch to new APIs in EFI code Hongyan Xia
2020-07-27 14:22 ` [PATCH v8 11/15] x86/smpboot: add exit path for clone_mapping() Hongyan Xia
2020-07-27 14:22 ` [PATCH v8 12/15] x86/smpboot: switch clone_mapping() to new APIs Hongyan Xia
2020-07-27 14:22 ` [PATCH v8 13/15] x86/mm: drop old page table APIs Hongyan Xia
2020-07-27 14:22 ` [PATCH v8 14/15] x86: switch to use domheap page for page tables Hongyan Xia
2020-07-27 14:22 ` [PATCH v8 15/15] x86/mm: drop _new suffix for page table APIs Hongyan Xia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).