All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/23] x86: refactor mm.c
@ 2017-09-14 12:58 Wei Liu
  2017-09-14 12:58 ` [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h Wei Liu
                   ` (22 more replies)
  0 siblings, 23 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

This series is rather different from v4 because of two things:

1. The staging branch has changed a lot.
2. Try to export stuff via local header where appropriate.

The end result is x86/mm.c goes from 6341 lines to 2930 lines, which means more
than half of the files is moved.

This series can be found at:
 https://xenbits.xen.org/git-http/people/liuw/xen.git wip.split-mm-v5

Wei.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

Wei Liu (23):
  x86/mm: move guest_get_eff_l1e to pv/mm.h
  x86/mm: export get_page_from_mfn
  x86/mm: move update_intpte to pv/mm.h
  x86/mm: move {un,}adjust_guest_l*e to pv/mm.h
  x86/mm: move ro page fault emulation code
  x86/mm: remove the now unused inclusion of pv/emulate.h
  x86/mm: move map_guest_l1e to pv/mm.c
  x86/mm: split out pv grant table code
  x86/mm: add pv prefix to {set,destroy}_gdt
  x86/mm: split out descriptor table manipulation code
  x86/mm: move compat descriptor table manipulation code
  x86/mm: move and rename map_ldt_shadow_page
  x86/mm: factor out pv_arch_init_memory
  x86/mm: move PV l4 table setup code
  x86/mm: move declaration of new_guest_cr3 to local pv/mm.h
  x86/mm: add pv prefix to {alloc,free}_page_type
  x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h
  x86/mm: export some stuff via local mm.h
  x86/mm: export get_page_light via asm-x86/mm.h
  x86/mm: split out PV mm code to pv/mm.c
  x86/mm: move and add pv prefix to invalidate_shadow_ldt
  x86/mm: split out PV mm hypercalls to pv/mm-hypercalls.c
  x86/mm: remove the now unused inclusion of pv/mm.h

 xen/arch/x86/domain.c               |   13 +-
 xen/arch/x86/mm.c                   | 4255 ++++-------------------------------
 xen/arch/x86/pv/Makefile            |    5 +
 xen/arch/x86/pv/descriptor-tables.c |  232 ++
 xen/arch/x86/pv/dom0_build.c        |    2 +
 xen/arch/x86/pv/domain.c            |    5 +
 xen/arch/x86/pv/emul-priv-op.c      |    1 +
 xen/arch/x86/pv/grant_table.c       |  327 +++
 xen/arch/x86/pv/mm-hypercalls.c     | 1461 ++++++++++++
 xen/arch/x86/pv/mm.c                | 1051 +++++++++
 xen/arch/x86/pv/mm.h                |  176 ++
 xen/arch/x86/pv/ro-page-fault.c     |  399 ++++
 xen/arch/x86/traps.c                |    5 +-
 xen/arch/x86/x86_64/compat/mm.c     |   39 +-
 xen/include/asm-x86/mm.h            |   38 +-
 xen/include/asm-x86/processor.h     |    5 -
 xen/include/asm-x86/pv/mm.h         |   72 +
 17 files changed, 4192 insertions(+), 3894 deletions(-)
 create mode 100644 xen/arch/x86/pv/descriptor-tables.c
 create mode 100644 xen/arch/x86/pv/grant_table.c
 create mode 100644 xen/arch/x86/pv/mm-hypercalls.c
 create mode 100644 xen/arch/x86/pv/mm.c
 create mode 100644 xen/arch/x86/pv/mm.h
 create mode 100644 xen/arch/x86/pv/ro-page-fault.c
 create mode 100644 xen/include/asm-x86/pv/mm.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
       [not found]   ` <1505408055.662832341@apps.rackspace.com>
  2017-09-22 11:36   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 02/23] x86/mm: export get_page_from_mfn Wei Liu
                   ` (21 subsequent siblings)
  22 siblings, 2 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Mkae it static inline. It well be used by map_ldt_shadow_page and ro
page fault emulation code later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 18 +-----------------
 xen/arch/x86/pv/mm.h | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 17 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.h

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5208e73734..8d2a4682c9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,6 +127,7 @@
 #include <asm/pv/grant_table.h>
 
 #include "pv/emulate.h"
+#include "pv/mm.h"
 
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
@@ -563,23 +564,6 @@ static l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn)
     return (l1_pgentry_t *)map_domain_page(*gl1mfn) + l1_table_offset(linear);
 }
 
-/* Read a PV guest's l1e that maps this linear address. */
-static l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
-{
-    l1_pgentry_t l1e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(linear)) ||
-         __copy_from_user(&l1e,
-                          &__linear_l1_table[l1_linear_offset(linear)],
-                          sizeof(l1_pgentry_t)) )
-        l1e = l1e_empty();
-
-    return l1e;
-}
-
 /*
  * Read the guest's l1e that maps this address, from the kernel-mode
  * page tables.
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
new file mode 100644
index 0000000000..5b5dbff433
--- /dev/null
+++ b/xen/arch/x86/pv/mm.h
@@ -0,0 +1,21 @@
+#ifndef __PV_MM_H__
+#define __PV_MM_H__
+
+/* Read a PV guest's l1e that maps this linear address. */
+static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
+{
+    l1_pgentry_t l1e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(linear)) ||
+         __copy_from_user(&l1e,
+                          &__linear_l1_table[l1_linear_offset(linear)],
+                          sizeof(l1_pgentry_t)) )
+        l1e = l1e_empty();
+
+    return l1e;
+}
+
+#endif /* __PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 02/23] x86/mm: export get_page_from_mfn
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
  2017-09-14 12:58 ` [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 11:44   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h Wei Liu
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

It will be used later in multiple files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 16 ----------------
 xen/include/asm-x86/mm.h | 14 ++++++++++++++
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8d2a4682c9..faa161b767 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -705,22 +705,6 @@ bool map_ldt_shadow_page(unsigned int offset)
     return true;
 }
 
-
-static bool get_page_from_mfn(mfn_t mfn, struct domain *d)
-{
-    struct page_info *page = mfn_to_page(mfn);
-
-    if ( unlikely(!mfn_valid(mfn)) || unlikely(!get_page(page, d)) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Could not get page ref for mfn %"PRI_mfn"\n", mfn_x(mfn));
-        return false;
-    }
-
-    return true;
-}
-
-
 static int get_page_and_type_from_mfn(
     mfn_t mfn, unsigned long type, struct domain *d,
     int partial, int preemptible)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index bef45e8e9f..7670912e0a 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -369,6 +369,20 @@ int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
 
+static inline bool get_page_from_mfn(mfn_t mfn, struct domain *d)
+{
+    struct page_info *page = __mfn_to_page(mfn_x(mfn));
+
+    if ( unlikely(!mfn_valid(mfn)) || unlikely(!get_page(page, d)) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Could not get page ref for mfn %"PRI_mfn"\n", mfn_x(mfn));
+        return false;
+    }
+
+    return true;
+}
+
 static inline void put_page_and_type(struct page_info *page)
 {
     put_page_type(page);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
  2017-09-14 12:58 ` [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h Wei Liu
  2017-09-14 12:58 ` [PATCH v5 02/23] x86/mm: export get_page_from_mfn Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:32   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e " Wei Liu
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

While at it, change the type of preserve_ad to bool.  Also move
UPDATE_ENTRY there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 65 ----------------------------------------------------
 xen/arch/x86/pv/mm.h | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index faa161b767..504a0a2a6a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -139,14 +139,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-/*
- * PTE updates can be done with ordinary writes except:
- *  1. Debug builds get extra checking by using CMPXCHG[8B].
- */
-#if !defined(NDEBUG)
-#define PTE_UPDATE_WITH_CMPXCHG
-#endif
-
 paddr_t __read_mostly mem_hotplug;
 
 /* Private domain structs for DOMID_XEN and DOMID_IO. */
@@ -1848,63 +1840,6 @@ void page_unlock(struct page_info *page)
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
 }
 
-/*
- * How to write an entry to the guest pagetables.
- * Returns false for failure (pointer not valid), true for success.
- */
-static inline bool update_intpte(
-    intpte_t *p, intpte_t old, intpte_t new, unsigned long mfn,
-    struct vcpu *v, int preserve_ad)
-{
-    bool rv = true;
-
-#ifndef PTE_UPDATE_WITH_CMPXCHG
-    if ( !preserve_ad )
-    {
-        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-    }
-    else
-#endif
-    {
-        intpte_t t = old;
-
-        for ( ; ; )
-        {
-            intpte_t _new = new;
-
-            if ( preserve_ad )
-                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
-            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
-            if ( unlikely(rv == 0) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Failed to update %" PRIpte " -> %" PRIpte
-                         ": saw %" PRIpte "\n", old, _new, t);
-                break;
-            }
-
-            if ( t == old )
-                break;
-
-            /* Allowed to change in Accessed/Dirty flags only. */
-            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
-            old = t;
-        }
-    }
-    return rv;
-}
-
-/*
- * Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu
- */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
-    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
-                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v), (_ad))
-
 /*
  * PTE flags that a guest may change without re-validating the PTE.
  * All other bits affect translation, caching, or Xen's safety.
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index 5b5dbff433..2563d280a5 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -18,4 +18,69 @@ static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
     return l1e;
 }
 
+/*
+ * PTE updates can be done with ordinary writes except:
+ *  1. Debug builds get extra checking by using CMPXCHG[8B].
+ */
+#if !defined(NDEBUG)
+#define PTE_UPDATE_WITH_CMPXCHG
+#endif
+
+/*
+ * How to write an entry to the guest pagetables.
+ * Returns false for failure (pointer not valid), true for success.
+ */
+static inline bool update_intpte(intpte_t *p, intpte_t old, intpte_t new,
+                                 unsigned long mfn, struct vcpu *v,
+                                 bool preserve_ad)
+{
+    bool rv = true;
+
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
+    {
+        intpte_t t = old;
+
+        for ( ; ; )
+        {
+            intpte_t _new = new;
+
+            if ( preserve_ad )
+                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+            if ( unlikely(rv == 0) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Failed to update %" PRIpte " -> %" PRIpte
+                         ": saw %" PRIpte "\n", old, _new, t);
+                break;
+            }
+
+            if ( t == old )
+                break;
+
+            /* Allowed to change in Accessed/Dirty flags only. */
+            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+            old = t;
+        }
+    }
+    return rv;
+}
+
+/*
+ * Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu
+ */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
+    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
+                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                  (_m), (_v), (_ad))
+
 #endif /* __PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e to pv/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (2 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:33   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 05/23] x86/mm: move ro page fault emulation code Wei Liu
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 57 -----------------------------------------------
 xen/arch/x86/pv/mm.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 504a0a2a6a..b05648e4e7 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1175,63 +1175,6 @@ get_page_from_l4e(
     return rc;
 }
 
-static l1_pgentry_t adjust_guest_l1e(l1_pgentry_t l1e, const struct domain *d)
-{
-    if ( likely(l1e_get_flags(l1e) & _PAGE_PRESENT) &&
-         likely(!is_pv_32bit_domain(d)) )
-    {
-        /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */
-        if ( (l1e_get_flags(l1e) & (_PAGE_GUEST_KERNEL | _PAGE_GLOBAL)) ==
-             (_PAGE_GUEST_KERNEL | _PAGE_GLOBAL) )
-            gdprintk(XENLOG_WARNING, "Global bit is set in kernel page %lx\n",
-                     l1e_get_pfn(l1e));
-
-        if ( !(l1e_get_flags(l1e) & _PAGE_USER) )
-            l1e_add_flags(l1e, (_PAGE_GUEST_KERNEL | _PAGE_USER));
-
-        if ( !(l1e_get_flags(l1e) & _PAGE_GUEST_KERNEL) )
-            l1e_add_flags(l1e, (_PAGE_GLOBAL | _PAGE_USER));
-    }
-
-    return l1e;
-}
-
-static l2_pgentry_t adjust_guest_l2e(l2_pgentry_t l2e, const struct domain *d)
-{
-    if ( likely(l2e_get_flags(l2e) & _PAGE_PRESENT) &&
-         likely(!is_pv_32bit_domain(d)) )
-        l2e_add_flags(l2e, _PAGE_USER);
-
-    return l2e;
-}
-
-static l3_pgentry_t adjust_guest_l3e(l3_pgentry_t l3e, const struct domain *d)
-{
-    if ( likely(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        l3e_add_flags(l3e, (likely(!is_pv_32bit_domain(d))
-                            ? _PAGE_USER : _PAGE_USER | _PAGE_RW));
-
-    return l3e;
-}
-
-static l3_pgentry_t unadjust_guest_l3e(l3_pgentry_t l3e, const struct domain *d)
-{
-    if ( unlikely(is_pv_32bit_domain(d)) &&
-         likely(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        l3e_remove_flags(l3e, _PAGE_USER | _PAGE_RW | _PAGE_ACCESSED);
-
-    return l3e;
-}
-
-static l4_pgentry_t adjust_guest_l4e(l4_pgentry_t l4e, const struct domain *d)
-{
-    if ( likely(l4e_get_flags(l4e) & _PAGE_PRESENT) &&
-         likely(!is_pv_32bit_domain(d)) )
-        l4e_add_flags(l4e, _PAGE_USER);
-
-    return l4e;
-}
-
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 {
     unsigned long     pfn = l1e_get_pfn(l1e);
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index 2563d280a5..6602e6ce07 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -83,4 +83,66 @@ static inline bool update_intpte(intpte_t *p, intpte_t old, intpte_t new,
                   _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
                   (_m), (_v), (_ad))
 
+static inline l1_pgentry_t adjust_guest_l1e(l1_pgentry_t l1e,
+                                            const struct domain *d)
+{
+    if ( likely(l1e_get_flags(l1e) & _PAGE_PRESENT) &&
+         likely(!is_pv_32bit_domain(d)) )
+    {
+        /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */
+        if ( (l1e_get_flags(l1e) & (_PAGE_GUEST_KERNEL | _PAGE_GLOBAL)) ==
+             (_PAGE_GUEST_KERNEL | _PAGE_GLOBAL) )
+            gdprintk(XENLOG_WARNING, "Global bit is set in kernel page %lx\n",
+                     l1e_get_pfn(l1e));
+
+        if ( !(l1e_get_flags(l1e) & _PAGE_USER) )
+            l1e_add_flags(l1e, (_PAGE_GUEST_KERNEL | _PAGE_USER));
+
+        if ( !(l1e_get_flags(l1e) & _PAGE_GUEST_KERNEL) )
+            l1e_add_flags(l1e, (_PAGE_GLOBAL | _PAGE_USER));
+    }
+
+    return l1e;
+}
+
+static inline l2_pgentry_t adjust_guest_l2e(l2_pgentry_t l2e,
+                                            const struct domain *d)
+{
+    if ( likely(l2e_get_flags(l2e) & _PAGE_PRESENT) &&
+         likely(!is_pv_32bit_domain(d)) )
+        l2e_add_flags(l2e, _PAGE_USER);
+
+    return l2e;
+}
+
+static inline l3_pgentry_t adjust_guest_l3e(l3_pgentry_t l3e,
+                                            const struct domain *d)
+{
+    if ( likely(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        l3e_add_flags(l3e, (likely(!is_pv_32bit_domain(d))
+                            ? _PAGE_USER : _PAGE_USER | _PAGE_RW));
+
+    return l3e;
+}
+
+static inline l3_pgentry_t unadjust_guest_l3e(l3_pgentry_t l3e,
+                                              const struct domain *d)
+{
+    if ( unlikely(is_pv_32bit_domain(d)) &&
+         likely(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        l3e_remove_flags(l3e, _PAGE_USER | _PAGE_RW | _PAGE_ACCESSED);
+
+    return l3e;
+}
+
+static inline l4_pgentry_t adjust_guest_l4e(l4_pgentry_t l4e,
+                                            const struct domain *d)
+{
+    if ( likely(l4e_get_flags(l4e) & _PAGE_PRESENT) &&
+         likely(!is_pv_32bit_domain(d)) )
+        l4e_add_flags(l4e, _PAGE_USER);
+
+    return l4e;
+}
+
 #endif /* __PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 05/23] x86/mm: move ro page fault emulation code
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (3 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e " Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:37   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h Wei Liu
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/ro-page-fault.c.  Create a new header file
asm-x86/pv/mm.h and move the declaration of pv_ro_page_fault there.
Include the new header file in traps.c.

Fix some coding style issues. The prefixes (ptwr and mmio) are
retained because there are two sets of emulation code in the same file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c               | 364 ------------------------------------
 xen/arch/x86/pv/Makefile        |   1 +
 xen/arch/x86/pv/ro-page-fault.c | 399 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c            |   1 +
 xen/include/asm-x86/mm.h        |   2 -
 xen/include/asm-x86/pv/mm.h     |  38 ++++
 6 files changed, 439 insertions(+), 366 deletions(-)
 create mode 100644 xen/arch/x86/pv/ro-page-fault.c
 create mode 100644 xen/include/asm-x86/pv/mm.h

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b05648e4e7..cc0a7cab41 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4729,265 +4729,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return 0;
 }
 
-
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct ptwr_emulate_ctxt *ptwr_ctxt = ctxt->data;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
-                  (addr & (bytes - 1))) )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                 ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t) - 1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        val <<= (offset) * 8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        old <<= (offset) * 8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(_mfn(mfn));
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    nl1e = adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        bool okay;
-        intpte_t t = old;
-
-        ol1e = l1e_from_intpte(old);
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(offset, 0, val, bytes, 0, ctxt);
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
-    {
-        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(offset, old, new, bytes, 1, ctxt);
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-static int ptwr_do_page_fault(struct x86_emulate_ctxt *ctxt,
-                              unsigned long addr, l1_pgentry_t pte)
-{
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .cr2 = addr,
-        .pte = pte,
-    };
-    struct page_info *page;
-    int rc;
-
-    if ( !get_page_from_mfn(l1e_get_mfn(pte), current->domain) )
-        return X86EMUL_UNHANDLEABLE;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    ctxt->data = &ptwr_ctxt;
-    rc = x86_emulate(ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    return rc;
-}
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
 int mmio_ro_emulated_write(
     enum x86_segment seg,
     unsigned long offset,
@@ -5009,14 +4750,6 @@ int mmio_ro_emulated_write(
     return X86EMUL_OKAY;
 }
 
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmio_ro_emulated_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
 int mmcfg_intercept_write(
     enum x86_segment seg,
     unsigned long offset,
@@ -5048,39 +4781,6 @@ int mmcfg_intercept_write(
     return X86EMUL_OKAY;
 }
 
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmcfg_intercept_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-static int mmio_ro_do_page_fault(struct x86_emulate_ctxt *ctxt,
-                                 unsigned long addr, l1_pgentry_t pte)
-{
-    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
-    mfn_t mfn = l1e_get_mfn(pte);
-
-    if ( mfn_valid(mfn) )
-    {
-        struct page_info *page = mfn_to_page(mfn);
-        const struct domain *owner = page_get_owner_and_reference(page);
-
-        if ( owner )
-            put_page(page);
-        if ( owner != dom_io )
-            return X86EMUL_UNHANDLEABLE;
-    }
-
-    ctxt->data = &mmio_ro_ctxt;
-    if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
-        return x86_emulate(ctxt, &mmcfg_intercept_ops);
-    else
-        return x86_emulate(ctxt, &mmio_ro_emulate_ops);
-}
-
 void *alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
@@ -6112,70 +5812,6 @@ void write_32bit_pse_identmap(uint32_t *l2)
                  _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
 }
 
-int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
-{
-    l1_pgentry_t pte;
-    const struct domain *currd = current->domain;
-    unsigned int addr_size = is_pv_32bit_domain(currd) ? 32 : BITS_PER_LONG;
-    struct x86_emulate_ctxt ctxt = {
-        .regs      = regs,
-        .vendor    = currd->arch.cpuid->x86_vendor,
-        .addr_size = addr_size,
-        .sp_size   = addr_size,
-        .lma       = addr_size > 32,
-    };
-    int rc;
-    bool mmio_ro;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    pte = guest_get_eff_l1e(addr);
-
-    /* We are only looking for read-only mappings */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT | _PAGE_RW)) != _PAGE_PRESENT) )
-        return 0;
-
-    mmio_ro = rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte));
-    if ( mmio_ro )
-        rc = mmio_ro_do_page_fault(&ctxt, addr, pte);
-    else
-        rc = ptwr_do_page_fault(&ctxt, addr, pte);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation covers writes to:
-         *  - L1 pagetables.
-         *  - MMCFG space or read-only MFNs.
-         * We tolerate #PF (from hitting an adjacent page or a successful
-         * concurrent pagetable update).  Anything else is an emulation bug,
-         * or a guest playing with the instruction stream under Xen's feet.
-         */
-        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ctxt.event.type, ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-        if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        if ( mmio_ro )
-            perfc_incr(mmio_ro_emulations);
-        else
-            perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 4e15484471..24af0b550e 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -7,6 +7,7 @@ obj-y += emul-priv-op.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
+obj-y += ro-page-fault.o
 obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
new file mode 100644
index 0000000000..53a3c15a31
--- /dev/null
+++ b/xen/arch/x86/pv/ro-page-fault.c
@@ -0,0 +1,399 @@
+/******************************************************************************
+ * arch/x86/pv/ro-page-fault.c
+ *
+ * Read-only page fault emulation for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/rangeset.h>
+#include <xen/sched.h>
+#include <xen/trace.h>
+
+#include <asm/domain.h>
+#include <asm/mm.h>
+#include <asm/pci.h>
+#include <asm/pv/mm.h>
+
+#include "emulate.h"
+#include "mm.h"
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
+
+/*********************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+static int ptwr_emulated_read(enum x86_segment seg, unsigned long offset,
+                              void *p_data, unsigned int bytes,
+                              struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
+                                unsigned int bytes, unsigned int do_cmpxchg,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    struct ptwr_emulate_ctxt *ptwr_ctxt = ctxt->data;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
+                  (addr & (bytes - 1))) )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                 ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t) - 1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        val <<= (offset) * 8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        old <<= (offset) * 8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(_mfn(mfn));
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    nl1e = adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        bool okay;
+        intpte_t t = old;
+
+        ol1e = l1e_from_intpte(old);
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(enum x86_segment seg, unsigned long offset,
+                               void *p_data, unsigned int bytes,
+                               struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(offset, 0, val, bytes, 0, ctxt);
+}
+
+static int ptwr_emulated_cmpxchg(enum x86_segment seg, unsigned long offset,
+                                 void *p_old, void *p_new, unsigned int bytes,
+                                 struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
+    {
+        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(offset, old, new, bytes, 1, ctxt);
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = ptwr_emulated_read,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+static int ptwr_do_page_fault(struct x86_emulate_ctxt *ctxt,
+                              unsigned long addr, l1_pgentry_t pte)
+{
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .cr2 = addr,
+        .pte = pte,
+    };
+    struct page_info *page;
+    int rc;
+
+    if ( !get_page_from_mfn(l1e_get_mfn(pte), current->domain) )
+        return X86EMUL_UNHANDLEABLE;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    ctxt->data = &ptwr_ctxt;
+    rc = x86_emulate(ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    return rc;
+}
+
+/*****************************************
+ * fault handling for read-only MMIO pages
+ */
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmio_ro_emulated_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmcfg_intercept_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+static int mmio_ro_do_page_fault(struct x86_emulate_ctxt *ctxt,
+                                 unsigned long addr, l1_pgentry_t pte)
+{
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+    mfn_t mfn = l1e_get_mfn(pte);
+
+    if ( mfn_valid(mfn) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        const struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return X86EMUL_UNHANDLEABLE;
+    }
+
+    ctxt->data = &mmio_ro_ctxt;
+    if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+        return x86_emulate(ctxt, &mmcfg_intercept_ops);
+    else
+        return x86_emulate(ctxt, &mmio_ro_emulate_ops);
+}
+
+int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
+{
+    l1_pgentry_t pte;
+    const struct domain *currd = current->domain;
+    unsigned int addr_size = is_pv_32bit_domain(currd) ? 32 : BITS_PER_LONG;
+    struct x86_emulate_ctxt ctxt = {
+        .regs      = regs,
+        .vendor    = currd->arch.cpuid->x86_vendor,
+        .addr_size = addr_size,
+        .sp_size   = addr_size,
+        .lma       = addr_size > 32,
+    };
+    int rc;
+    bool mmio_ro;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    pte = guest_get_eff_l1e(addr);
+
+    /* We are only looking for read-only mappings */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT | _PAGE_RW)) != _PAGE_PRESENT) )
+        return 0;
+
+    mmio_ro = rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte));
+    if ( mmio_ro )
+        rc = mmio_ro_do_page_fault(&ctxt, addr, pte);
+    else
+        rc = ptwr_do_page_fault(&ctxt, addr, pte);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation covers writes to:
+         *  - L1 pagetables.
+         *  - MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page or a successful
+         * concurrent pagetable update).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        if ( mmio_ro )
+            perfc_incr(mmio_ro_emulations);
+        else
+            perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 6091f239ce..d84db4acda 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -78,6 +78,7 @@
 #include <asm/cpuid.h>
 #include <xsm/xsm.h>
 #include <asm/pv/traps.h>
+#include <asm/pv/mm.h>
 
 /*
  * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 7670912e0a..a48d75d434 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -525,8 +525,6 @@ extern int mmcfg_intercept_write(enum x86_segment seg,
 int pv_emul_cpuid(uint32_t leaf, uint32_t subleaf,
                   struct cpuid_leaf *res, struct x86_emulate_ctxt *ctxt);
 
-int pv_ro_page_fault(unsigned long, struct cpu_user_regs *);
-
 int audit_adjust_pgtables(struct domain *d, int dir, int noisy);
 
 extern int pagefault_by_memadd(unsigned long addr, struct cpu_user_regs *regs);
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
new file mode 100644
index 0000000000..e251e1ef06
--- /dev/null
+++ b/xen/include/asm-x86/pv/mm.h
@@ -0,0 +1,38 @@
+/*
+ * asm-x86/pv/mm.h
+ *
+ * Memory management interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_MM_H__
+#define __X86_PV_MM_H__
+
+#ifdef CONFIG_PV
+
+int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
+
+#else
+
+static inline int pv_ro_page_fault(unsigned long addr,
+                                   struct cpu_user_regs *regs)
+{
+    return 0;
+}
+
+#endif
+
+#endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (4 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 05/23] x86/mm: move ro page fault emulation code Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:37   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c Wei Liu
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index cc0a7cab41..e77abb015b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -126,7 +126,6 @@
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
 
-#include "pv/emulate.h"
 #include "pv/mm.h"
 
 /* Override macros from asm/page.h to make them work with mfn_t */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (5 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:58   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 08/23] x86/mm: split out pv grant table code Wei Liu
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

And export the function via pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 29 --------------------
 xen/arch/x86/pv/Makefile |  1 +
 xen/arch/x86/pv/mm.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/mm.h     |  2 ++
 4 files changed, 72 insertions(+), 29 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e77abb015b..29d8e18819 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -526,35 +526,6 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/*
- * Get a mapping of a PV guest's l1e for this linear address.  The return
- * pointer should be unmapped using unmap_domain_page().
- */
-static l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn)
-{
-    l2_pgentry_t l2e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(linear)) )
-        return NULL;
-
-    /* Find this l1e and its enclosing l1mfn in the linear map. */
-    if ( __copy_from_user(&l2e,
-                          &__linear_l2_table[l2_linear_offset(linear)],
-                          sizeof(l2_pgentry_t)) )
-        return NULL;
-
-    /* Check flags that it will be safe to read the l1e. */
-    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
-        return NULL;
-
-    *gl1mfn = l2e_get_mfn(l2e);
-
-    return (l1_pgentry_t *)map_domain_page(*gl1mfn) + l1_table_offset(linear);
-}
-
 /*
  * Read the guest's l1e that maps this address, from the kernel-mode
  * page tables.
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 24af0b550e..d4fcc2ab94 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -7,6 +7,7 @@ obj-y += emul-priv-op.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
+obj-y += mm.o
 obj-y += ro-page-fault.o
 obj-y += traps.o
 
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..4bfa322788
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,69 @@
+/*
+ * pv/mm.c
+ *
+ * Memory managment code for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+
+#include <asm/current.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
+
+/*
+ * Get a mapping of a PV guest's l1e for this linear address.  The return
+ * pointer should be unmapped using unmap_domain_page().
+ */
+l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn)
+{
+    l2_pgentry_t l2e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(linear)) )
+        return NULL;
+
+    /* Find this l1e and its enclosing l1mfn in the linear map. */
+    if ( __copy_from_user(&l2e,
+                          &__linear_l2_table[l2_linear_offset(linear)],
+                          sizeof(l2_pgentry_t)) )
+        return NULL;
+
+    /* Check flags that it will be safe to read the l1e. */
+    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+        return NULL;
+
+    *gl1mfn = l2e_get_mfn(l2e);
+
+    return (l1_pgentry_t *)map_domain_page(*gl1mfn) + l1_table_offset(linear);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index 6602e6ce07..acef061acf 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -1,6 +1,8 @@
 #ifndef __PV_MM_H__
 #define __PV_MM_H__
 
+l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn);
+
 /* Read a PV guest's l1e that maps this linear address. */
 static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 08/23] x86/mm: split out pv grant table code
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (6 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 12:59   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt Wei Liu
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/grant_table.c. Nothing needs to be done with
regard to headers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c             | 283 ------------------------------------
 xen/arch/x86/pv/Makefile      |   1 +
 xen/arch/x86/pv/grant_table.c | 327 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 328 insertions(+), 283 deletions(-)
 create mode 100644 xen/arch/x86/pv/grant_table.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 29d8e18819..69a47d87d6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3630,289 +3630,6 @@ long do_mmu_update(
     return rc;
 }
 
-static unsigned int grant_to_pte_flags(unsigned int grant_flags,
-                                       unsigned int cache_flags)
-{
-    unsigned int pte_flags =
-        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB | _PAGE_NX;
-
-    if ( grant_flags & GNTMAP_application_map )
-        pte_flags |= _PAGE_USER;
-    if ( !(grant_flags & GNTMAP_readonly) )
-        pte_flags |= _PAGE_RW;
-
-    pte_flags |= MASK_INSR((grant_flags >> _GNTMAP_guest_avail0), _PAGE_AVAIL);
-    pte_flags |= cacheattr_to_pte_flags(cache_flags >> 5);
-
-    return pte_flags;
-}
-
-int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                            unsigned int flags, unsigned int cache_flags)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    l1_pgentry_t nl1e, ol1e, *pl1e;
-    struct page_info *page;
-    mfn_t gl1mfn;
-    int rc = GNTST_general_error;
-
-    nl1e = l1e_from_pfn(frame, grant_to_pte_flags(flags, cache_flags));
-    nl1e = adjust_guest_l1e(nl1e, currd);
-
-    /*
-     * The meaning of addr depends on GNTMAP_contains_pte.  It is either a
-     * machine address of an L1e the guest has nominated to be altered, or a
-     * linear address we need to look up the appropriate L1e for.
-     */
-    if ( flags & GNTMAP_contains_pte )
-    {
-        /* addr must be suitably aligned, or we will corrupt adjacent ptes. */
-        if ( !IS_ALIGNED(addr, sizeof(nl1e)) )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Misaligned PTE address %"PRIx64"\n", addr);
-            goto out;
-        }
-
-        gl1mfn = _mfn(addr >> PAGE_SHIFT);
-
-        if ( !get_page_from_mfn(gl1mfn, currd) )
-            goto out;
-
-        pl1e = map_domain_page(gl1mfn) + (addr & ~PAGE_MASK);
-    }
-    else
-    {
-        /* Guest trying to pass an out-of-range linear address? */
-        if ( is_pv_32bit_domain(currd) && addr != (uint32_t)addr )
-            goto out;
-
-        pl1e = map_guest_l1e(addr, &gl1mfn);
-
-        if ( !pl1e )
-        {
-            gdprintk(XENLOG_WARNING,
-                     "Could not find L1 PTE for linear address %"PRIx64"\n",
-                     addr);
-            goto out;
-        }
-
-        if ( !get_page_from_mfn(gl1mfn, currd) )
-            goto out_unmap;
-    }
-
-    page = mfn_to_page(gl1mfn);
-    if ( !page_lock(page) )
-        goto out_put;
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-        goto out_unlock;
-
-    ol1e = *pl1e;
-    if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
-        rc = GNTST_okay;
-
- out_unlock:
-    page_unlock(page);
- out_put:
-    put_page(page);
- out_unmap:
-    unmap_domain_page(pl1e);
-
-    if ( rc == GNTST_okay )
-        put_page_from_l1e(ol1e, currd);
-
- out:
-    return rc;
-}
-
-/*
- * This exists soley for implementing GNTABOP_unmap_and_replace, the ABI of
- * which is bizarre.  This GNTTABOP isn't used any more, but was used by
- * classic-xen kernels and PVOps Linux before the M2P_OVERRIDE infrastructure
- * was replaced with something which actually worked.
- *
- * Look up the L1e mapping linear, and zap it.  Return the L1e via *out.
- * Returns a boolean indicating success.  If success, the caller is
- * responsible for calling put_page_from_l1e().
- */
-static bool steal_linear_address(unsigned long linear, l1_pgentry_t *out)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    l1_pgentry_t *pl1e, ol1e;
-    struct page_info *page;
-    mfn_t gl1mfn;
-    bool okay = false;
-
-    ASSERT(is_pv_domain(currd));
-
-    pl1e = map_guest_l1e(linear, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Could not find L1 PTE for linear %"PRIx64"\n", linear);
-        goto out;
-    }
-
-    if ( !get_page_from_mfn(gl1mfn, currd) )
-        goto out_unmap;
-
-    page = mfn_to_page(gl1mfn);
-    if ( !page_lock(page) )
-        goto out_put;
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-        goto out_unlock;
-
-    ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), mfn_x(gl1mfn), curr, 0);
-
- out_unlock:
-    page_unlock(page);
- out_put:
-    put_page(page);
- out_unmap:
-    unmap_domain_page(pl1e);
-
-    if ( okay )
-        *out = ol1e;
-
- out:
-    return okay;
-}
-
-/*
- * Passing a new_addr of zero is taken to mean destroy.  Passing a non-zero
- * new_addr has only ever been available via GNTABOP_unmap_and_replace, and
- * only when !(flags & GNTMAP_contains_pte).
- */
-int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                             uint64_t new_addr, unsigned int flags)
-{
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    l1_pgentry_t nl1e = l1e_empty(), ol1e, *pl1e;
-    struct page_info *page;
-    mfn_t gl1mfn;
-    int rc = GNTST_general_error;
-    unsigned int grant_pte_flags = grant_to_pte_flags(flags, 0);
-
-    /*
-     * On top of the explicit settings done by create_grant_pv_mapping()
-     * also open-code relevant parts of adjust_guest_l1e(). Don't mirror
-     * available and cachability flags, though.
-     */
-    if ( !is_pv_32bit_domain(currd) )
-        grant_pte_flags |= (grant_pte_flags & _PAGE_USER)
-                           ? _PAGE_GLOBAL
-                           : _PAGE_GUEST_KERNEL | _PAGE_USER;
-
-    /*
-     * addr comes from Xen's active_entry tracking, and was used successfully
-     * to create a grant.
-     *
-     * The meaning of addr depends on GNTMAP_contains_pte.  It is either a
-     * machine address of an L1e the guest has nominated to be altered, or a
-     * linear address we need to look up the appropriate L1e for.
-     */
-    if ( flags & GNTMAP_contains_pte )
-    {
-        /* Replace not available in this addressing mode. */
-        if ( new_addr )
-            goto out;
-
-        /* Sanity check that we won't clobber the pagetable. */
-        if ( !IS_ALIGNED(addr, sizeof(nl1e)) )
-        {
-            ASSERT_UNREACHABLE();
-            goto out;
-        }
-
-        gl1mfn = _mfn(addr >> PAGE_SHIFT);
-
-        if ( !get_page_from_mfn(gl1mfn, currd) )
-            goto out;
-
-        pl1e = map_domain_page(gl1mfn) + (addr & ~PAGE_MASK);
-    }
-    else
-    {
-        if ( is_pv_32bit_domain(currd) )
-        {
-            if ( addr != (uint32_t)addr )
-            {
-                ASSERT_UNREACHABLE();
-                goto out;
-            }
-
-            /* Guest trying to pass an out-of-range linear address? */
-            if ( new_addr != (uint32_t)new_addr )
-                goto out;
-        }
-
-        if ( new_addr && !steal_linear_address(new_addr, &nl1e) )
-            goto out;
-
-        pl1e = map_guest_l1e(addr, &gl1mfn);
-
-        if ( !pl1e )
-            goto out;
-
-        if ( !get_page_from_mfn(gl1mfn, currd) )
-            goto out_unmap;
-    }
-
-    page = mfn_to_page(gl1mfn);
-
-    if ( !page_lock(page) )
-        goto out_put;
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-        goto out_unlock;
-
-    ol1e = *pl1e;
-
-    /*
-     * Check that the address supplied is actually mapped to frame (with
-     * appropriate permissions).
-     */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) ||
-         unlikely((l1e_get_flags(ol1e) ^ grant_pte_flags) &
-                  (_PAGE_PRESENT | _PAGE_RW)) )
-    {
-        gdprintk(XENLOG_ERR,
-                 "PTE %"PRIpte" for %"PRIx64" doesn't match grant (%"PRIpte")\n",
-                 l1e_get_intpte(ol1e), addr,
-                 l1e_get_intpte(l1e_from_pfn(frame, grant_pte_flags)));
-        goto out_unlock;
-    }
-
-    if ( unlikely((l1e_get_flags(ol1e) ^ grant_pte_flags) &
-                  ~(_PAGE_AVAIL | PAGE_CACHE_ATTRS)) )
-        gdprintk(XENLOG_WARNING,
-                 "PTE flags %x for %"PRIx64" don't match grant (%x)\n",
-                 l1e_get_flags(ol1e), addr, grant_pte_flags);
-
-    if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
-        rc = GNTST_okay;
-
- out_unlock:
-    page_unlock(page);
- out_put:
-    put_page(page);
- out_unmap:
-    unmap_domain_page(pl1e);
-
- out:
-    /* If there was an error, we are still responsible for the stolen pte. */
-    if ( rc )
-        put_page_from_l1e(nl1e, currd);
-
-    return rc;
-}
-
 int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags)
 {
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index d4fcc2ab94..a692ee6432 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -4,6 +4,7 @@ obj-y += emulate.o
 obj-y += emul-gate-op.o
 obj-y += emul-inv-op.o
 obj-y += emul-priv-op.o
+obj-y += grant_table.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
diff --git a/xen/arch/x86/pv/grant_table.c b/xen/arch/x86/pv/grant_table.c
new file mode 100644
index 0000000000..81988a3b57
--- /dev/null
+++ b/xen/arch/x86/pv/grant_table.c
@@ -0,0 +1,327 @@
+/*
+ * pv/grant_table.c
+ *
+ * Grant table interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/types.h>
+
+#include <public/grant_table.h>
+
+#include <asm/p2m.h>
+#include <asm/pv/mm.h>
+
+#include "mm.h"
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
+
+static unsigned int grant_to_pte_flags(unsigned int grant_flags,
+                                       unsigned int cache_flags)
+{
+    unsigned int pte_flags =
+        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB | _PAGE_NX;
+
+    if ( grant_flags & GNTMAP_application_map )
+        pte_flags |= _PAGE_USER;
+    if ( !(grant_flags & GNTMAP_readonly) )
+        pte_flags |= _PAGE_RW;
+
+    pte_flags |= MASK_INSR((grant_flags >> _GNTMAP_guest_avail0), _PAGE_AVAIL);
+    pte_flags |= cacheattr_to_pte_flags(cache_flags >> 5);
+
+    return pte_flags;
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            unsigned int flags, unsigned int cache_flags)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    l1_pgentry_t nl1e, ol1e, *pl1e;
+    struct page_info *page;
+    mfn_t gl1mfn;
+    int rc = GNTST_general_error;
+
+    nl1e = l1e_from_pfn(frame, grant_to_pte_flags(flags, cache_flags));
+    nl1e = adjust_guest_l1e(nl1e, currd);
+
+    /*
+     * The meaning of addr depends on GNTMAP_contains_pte.  It is either a
+     * machine address of an L1e the guest has nominated to be altered, or a
+     * linear address we need to look up the appropriate L1e for.
+     */
+    if ( flags & GNTMAP_contains_pte )
+    {
+        /* addr must be suitably aligned, or we will corrupt adjacent ptes. */
+        if ( !IS_ALIGNED(addr, sizeof(nl1e)) )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Misaligned PTE address %"PRIx64"\n", addr);
+            goto out;
+        }
+
+        gl1mfn = _mfn(addr >> PAGE_SHIFT);
+
+        if ( !get_page_from_mfn(gl1mfn, currd) )
+            goto out;
+
+        pl1e = map_domain_page(gl1mfn) + (addr & ~PAGE_MASK);
+    }
+    else
+    {
+        /* Guest trying to pass an out-of-range linear address? */
+        if ( is_pv_32bit_domain(currd) && addr != (uint32_t)addr )
+            goto out;
+
+        pl1e = map_guest_l1e(addr, &gl1mfn);
+
+        if ( !pl1e )
+        {
+            gdprintk(XENLOG_WARNING,
+                     "Could not find L1 PTE for linear address %"PRIx64"\n",
+                     addr);
+            goto out;
+        }
+
+        if ( !get_page_from_mfn(gl1mfn, currd) )
+            goto out_unmap;
+    }
+
+    page = mfn_to_page(gl1mfn);
+    if ( !page_lock(page) )
+        goto out_put;
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+        goto out_unlock;
+
+    ol1e = *pl1e;
+    if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
+        rc = GNTST_okay;
+
+ out_unlock:
+    page_unlock(page);
+ out_put:
+    put_page(page);
+ out_unmap:
+    unmap_domain_page(pl1e);
+
+    if ( rc == GNTST_okay )
+        put_page_from_l1e(ol1e, currd);
+
+ out:
+    return rc;
+}
+
+/*
+ * This exists soley for implementing GNTABOP_unmap_and_replace, the ABI of
+ * which is bizarre.  This GNTTABOP isn't used any more, but was used by
+ * classic-xen kernels and PVOps Linux before the M2P_OVERRIDE infrastructure
+ * was replaced with something which actually worked.
+ *
+ * Look up the L1e mapping linear, and zap it.  Return the L1e via *out.
+ * Returns a boolean indicating success.  If success, the caller is
+ * responsible for calling put_page_from_l1e().
+ */
+static bool steal_linear_address(unsigned long linear, l1_pgentry_t *out)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    l1_pgentry_t *pl1e, ol1e;
+    struct page_info *page;
+    mfn_t gl1mfn;
+    bool okay = false;
+
+    ASSERT(is_pv_domain(currd));
+
+    pl1e = map_guest_l1e(linear, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Could not find L1 PTE for linear %"PRIx64"\n", linear);
+        goto out;
+    }
+
+    if ( !get_page_from_mfn(gl1mfn, currd) )
+        goto out_unmap;
+
+    page = mfn_to_page(gl1mfn);
+    if ( !page_lock(page) )
+        goto out_put;
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+        goto out_unlock;
+
+    ol1e = *pl1e;
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(), mfn_x(gl1mfn), curr, 0);
+
+ out_unlock:
+    page_unlock(page);
+ out_put:
+    put_page(page);
+ out_unmap:
+    unmap_domain_page(pl1e);
+
+    if ( okay )
+        *out = ol1e;
+
+ out:
+    return okay;
+}
+
+/*
+ * Passing a new_addr of zero is taken to mean destroy.  Passing a non-zero
+ * new_addr has only ever been available via GNTABOP_unmap_and_replace, and
+ * only when !(flags & GNTMAP_contains_pte).
+ */
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    l1_pgentry_t nl1e = l1e_empty(), ol1e, *pl1e;
+    struct page_info *page;
+    mfn_t gl1mfn;
+    int rc = GNTST_general_error;
+    unsigned int grant_pte_flags = grant_to_pte_flags(flags, 0);
+
+    /*
+     * On top of the explicit settings done by create_grant_pv_mapping()
+     * also open-code relevant parts of adjust_guest_l1e(). Don't mirror
+     * available and cachability flags, though.
+     */
+    if ( !is_pv_32bit_domain(currd) )
+        grant_pte_flags |= (grant_pte_flags & _PAGE_USER)
+                           ? _PAGE_GLOBAL
+                           : _PAGE_GUEST_KERNEL | _PAGE_USER;
+
+    /*
+     * addr comes from Xen's active_entry tracking, and was used successfully
+     * to create a grant.
+     *
+     * The meaning of addr depends on GNTMAP_contains_pte.  It is either a
+     * machine address of an L1e the guest has nominated to be altered, or a
+     * linear address we need to look up the appropriate L1e for.
+     */
+    if ( flags & GNTMAP_contains_pte )
+    {
+        /* Replace not available in this addressing mode. */
+        if ( new_addr )
+            goto out;
+
+        /* Sanity check that we won't clobber the pagetable. */
+        if ( !IS_ALIGNED(addr, sizeof(nl1e)) )
+        {
+            ASSERT_UNREACHABLE();
+            goto out;
+        }
+
+        gl1mfn = _mfn(addr >> PAGE_SHIFT);
+
+        if ( !get_page_from_mfn(gl1mfn, currd) )
+            goto out;
+
+        pl1e = map_domain_page(gl1mfn) + (addr & ~PAGE_MASK);
+    }
+    else
+    {
+        if ( is_pv_32bit_domain(currd) )
+        {
+            if ( addr != (uint32_t)addr )
+            {
+                ASSERT_UNREACHABLE();
+                goto out;
+            }
+
+            /* Guest trying to pass an out-of-range linear address? */
+            if ( new_addr != (uint32_t)new_addr )
+                goto out;
+        }
+
+        if ( new_addr && !steal_linear_address(new_addr, &nl1e) )
+            goto out;
+
+        pl1e = map_guest_l1e(addr, &gl1mfn);
+
+        if ( !pl1e )
+            goto out;
+
+        if ( !get_page_from_mfn(gl1mfn, currd) )
+            goto out_unmap;
+    }
+
+    page = mfn_to_page(gl1mfn);
+
+    if ( !page_lock(page) )
+        goto out_put;
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+        goto out_unlock;
+
+    ol1e = *pl1e;
+
+    /*
+     * Check that the address supplied is actually mapped to frame (with
+     * appropriate permissions).
+     */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) ||
+         unlikely((l1e_get_flags(ol1e) ^ grant_pte_flags) &
+                  (_PAGE_PRESENT | _PAGE_RW)) )
+    {
+        gdprintk(XENLOG_ERR,
+                 "PTE %"PRIpte" for %"PRIx64" doesn't match grant (%"PRIpte")\n",
+                 l1e_get_intpte(ol1e), addr,
+                 l1e_get_intpte(l1e_from_pfn(frame, grant_pte_flags)));
+        goto out_unlock;
+    }
+
+    if ( unlikely((l1e_get_flags(ol1e) ^ grant_pte_flags) &
+                  ~(_PAGE_AVAIL | PAGE_CACHE_ATTRS)) )
+        gdprintk(XENLOG_WARNING,
+                 "PTE flags %x for %"PRIx64" don't match grant (%x)\n",
+                 l1e_get_flags(ol1e), addr, grant_pte_flags);
+
+    if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn_x(gl1mfn), curr, 0) )
+        rc = GNTST_okay;
+
+ out_unlock:
+    page_unlock(page);
+ out_put:
+    put_page(page);
+ out_unmap:
+    unmap_domain_page(pl1e);
+
+ out:
+    /* If there was an error, we are still responsible for the stolen pte. */
+    if ( rc )
+        put_page_from_l1e(nl1e, currd);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (7 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 08/23] x86/mm: split out pv grant table code Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:02   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code Wei Liu
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Unfortunately they can't stay local to PV code because domain.c still
needs them. Change their names and fix up call sites. The code will be
moved later together with other descriptor table manipulation code.

Also move the declarations to pv/mm.h and provide stubs.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c           | 11 ++++++-----
 xen/arch/x86/mm.c               | 10 ++++------
 xen/arch/x86/x86_64/compat/mm.c |  4 +++-
 xen/include/asm-x86/processor.h |  5 -----
 xen/include/asm-x86/pv/mm.h     | 10 ++++++++++
 5 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dbddc536d3..e9367bd8aa 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -64,6 +64,7 @@
 #include <compat/vcpu.h>
 #include <asm/psr.h>
 #include <asm/pv/domain.h>
+#include <asm/pv/mm.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
@@ -992,7 +993,7 @@ int arch_set_info_guest(
         return rc;
 
     if ( !compat )
-        rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents);
+        rc = (int)pv_set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents);
     else
     {
         unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)];
@@ -1002,7 +1003,7 @@ int arch_set_info_guest(
             return -EINVAL;
         for ( i = 0; i < n; ++i )
             gdt_frames[i] = c.cmp->gdt_frames[i];
-        rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents);
+        rc = (int)pv_set_gdt(v, gdt_frames, c.cmp->gdt_ents);
     }
     if ( rc != 0 )
         return rc;
@@ -1101,7 +1102,7 @@ int arch_set_info_guest(
     {
         if ( cr3_page )
             put_page(cr3_page);
-        destroy_gdt(v);
+        pv_destroy_gdt(v);
         return rc;
     }
 
@@ -1153,7 +1154,7 @@ int arch_vcpu_reset(struct vcpu *v)
 {
     if ( is_pv_vcpu(v) )
     {
-        destroy_gdt(v);
+        pv_destroy_gdt(v);
         return vcpu_destroy_pagetables(v);
     }
 
@@ -1896,7 +1897,7 @@ int domain_relinquish_resources(struct domain *d)
                  * the LDT as it automatically gets squashed with the guest
                  * mappings.
                  */
-                destroy_gdt(v);
+                pv_destroy_gdt(v);
             }
         }
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 69a47d87d6..e505be7cf5 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3858,7 +3858,7 @@ long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
  * Descriptor Tables
  */
 
-void destroy_gdt(struct vcpu *v)
+void pv_destroy_gdt(struct vcpu *v)
 {
     l1_pgentry_t *pl1e;
     unsigned int i;
@@ -3877,9 +3877,7 @@ void destroy_gdt(struct vcpu *v)
 }
 
 
-long set_gdt(struct vcpu *v,
-             unsigned long *frames,
-             unsigned int entries)
+long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries)
 {
     struct domain *d = v->domain;
     l1_pgentry_t *pl1e;
@@ -3906,7 +3904,7 @@ long set_gdt(struct vcpu *v,
     }
 
     /* Tear down the old GDT. */
-    destroy_gdt(v);
+    pv_destroy_gdt(v);
 
     /* Install the new GDT. */
     v->arch.pv_vcpu.gdt_ents = entries;
@@ -3945,7 +3943,7 @@ long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
 
     domain_lock(curr->domain);
 
-    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
+    if ( (ret = pv_set_gdt(curr, frames, entries)) == 0 )
         flush_tlb_local();
 
     domain_unlock(curr->domain);
diff --git a/xen/arch/x86/x86_64/compat/mm.c b/xen/arch/x86/x86_64/compat/mm.c
index ef0ff86519..16ea2a80df 100644
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -6,6 +6,8 @@
 #include <asm/mem_paging.h>
 #include <asm/mem_sharing.h>
 
+#include <asm/pv/mm.h>
+
 int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
 {
     unsigned int i, nr_pages = (entries + 511) / 512;
@@ -31,7 +33,7 @@ int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries
 
     domain_lock(current->domain);
 
-    if ( (ret = set_gdt(current, frames, entries)) == 0 )
+    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
         flush_tlb_local();
 
     domain_unlock(current->domain);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 8b39fb4be9..41a8d8c32f 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -467,11 +467,6 @@ extern void init_int80_direct_trap(struct vcpu *v);
 
 extern void write_ptbase(struct vcpu *v);
 
-void destroy_gdt(struct vcpu *d);
-long set_gdt(struct vcpu *d, 
-             unsigned long *frames, 
-             unsigned int entries);
-
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static always_inline void rep_nop(void)
 {
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index e251e1ef06..3ca24cc70a 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -25,14 +25,24 @@
 
 int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
 
+long pv_set_gdt(struct vcpu *d, unsigned long *frames, unsigned int entries);
+void pv_destroy_gdt(struct vcpu *d);
+
 #else
 
+#include <xen/errno.h>
+
 static inline int pv_ro_page_fault(unsigned long addr,
                                    struct cpu_user_regs *regs)
 {
     return 0;
 }
 
+static inline long pv_set_gdt(struct vcpu *d, unsigned long *frames,
+                              unsigned int entries)
+{ return -EINVAL; }
+static inline void pv_destroy_gdt(struct vcpu *d) {}
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (8 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:07   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 11/23] x86/mm: move compat " Wei Liu
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/descriptor-tables.c. Change u64 to uint64_t while
moving. Use currd in do_update_descriptor.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c                   | 155 -----------------------------
 xen/arch/x86/pv/Makefile            |   1 +
 xen/arch/x86/pv/descriptor-tables.c | 192 ++++++++++++++++++++++++++++++++++++
 3 files changed, 193 insertions(+), 155 deletions(-)
 create mode 100644 xen/arch/x86/pv/descriptor-tables.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e505be7cf5..bfdba34468 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3852,161 +3852,6 @@ long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
     return rc;
 }
 
-
-
-/*************************
- * Descriptor Tables
- */
-
-void pv_destroy_gdt(struct vcpu *v)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
-
-    v->arch.pv_vcpu.gdt_ents = 0;
-    pl1e = pv_gdt_ptes(v);
-    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
-    {
-        pfn = l1e_get_pfn(pl1e[i]);
-        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
-            put_page_and_type(mfn_to_page(_mfn(pfn)));
-        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
-        v->arch.pv_vcpu.gdt_frames[i] = 0;
-    }
-}
-
-
-long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries)
-{
-    struct domain *d = v->domain;
-    l1_pgentry_t *pl1e;
-    /* NB. There are 512 8-byte entries per GDT page. */
-    unsigned int i, nr_pages = (entries + 511) / 512;
-
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    /* Check the pages in the new GDT. */
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        struct page_info *page;
-
-        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
-        if ( !page )
-            goto fail;
-        if ( !get_page_type(page, PGT_seg_desc_page) )
-        {
-            put_page(page);
-            goto fail;
-        }
-        frames[i] = mfn_x(page_to_mfn(page));
-    }
-
-    /* Tear down the old GDT. */
-    pv_destroy_gdt(v);
-
-    /* Install the new GDT. */
-    v->arch.pv_vcpu.gdt_ents = entries;
-    pl1e = pv_gdt_ptes(v);
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
-        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
-    }
-
-    return 0;
-
- fail:
-    while ( i-- > 0 )
-    {
-        put_page_and_type(mfn_to_page(_mfn(frames[i])));
-    }
-    return -EINVAL;
-}
-
-
-long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
-                unsigned int entries)
-{
-    int nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    struct vcpu *curr = current;
-    long ret;
-
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    if ( copy_from_guest(frames, frame_list, nr_pages) )
-        return -EFAULT;
-
-    domain_lock(curr->domain);
-
-    if ( (ret = pv_set_gdt(curr, frames, entries)) == 0 )
-        flush_tlb_local();
-
-    domain_unlock(curr->domain);
-
-    return ret;
-}
-
-
-long do_update_descriptor(u64 pa, u64 desc)
-{
-    struct domain *dom = current->domain;
-    unsigned long gmfn = pa >> PAGE_SHIFT;
-    unsigned long mfn;
-    unsigned int  offset;
-    struct desc_struct *gdt_pent, d;
-    struct page_info *page;
-    long ret = -EINVAL;
-
-    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
-
-    *(u64 *)&d = desc;
-
-    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
-    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
-         !page ||
-         !check_descriptor(dom, &d) )
-    {
-        if ( page )
-            put_page(page);
-        return -EINVAL;
-    }
-    mfn = mfn_x(page_to_mfn(page));
-
-    /* Check if the given frame is in use in an unsafe context. */
-    switch ( page->u.inuse.type_info & PGT_type_mask )
-    {
-    case PGT_seg_desc_page:
-        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
-            goto out;
-        break;
-    default:
-        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
-            goto out;
-        break;
-    }
-
-    paging_mark_dirty(dom, _mfn(mfn));
-
-    /* All is good so make the update. */
-    gdt_pent = map_domain_page(_mfn(mfn));
-    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
-    unmap_domain_page(gdt_pent);
-
-    put_page_type(page);
-
-    ret = 0; /* success */
-
- out:
-    put_page(page);
-
-    return ret;
-}
-
 typedef struct e820entry e820entry_t;
 DEFINE_XEN_GUEST_HANDLE(e820entry_t);
 
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index a692ee6432..bac2792aa2 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,4 +1,5 @@
 obj-y += callback.o
+obj-y += descriptor-tables.o
 obj-y += domain.o
 obj-y += emulate.o
 obj-y += emul-gate-op.o
diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
new file mode 100644
index 0000000000..04fb37f2ce
--- /dev/null
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -0,0 +1,192 @@
+/*
+ * arch/x86/pv/descriptor-tables.c
+ *
+ * Descriptor table manipulation code for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+
+#include <asm/p2m.h>
+#include <asm/pv/mm.h>
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
+
+/*******************
+ * Descriptor Tables
+ */
+
+void pv_destroy_gdt(struct vcpu *v)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
+
+    v->arch.pv_vcpu.gdt_ents = 0;
+    pl1e = pv_gdt_ptes(v);
+    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
+    {
+        pfn = l1e_get_pfn(pl1e[i]);
+        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
+            put_page_and_type(mfn_to_page(_mfn(pfn)));
+        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
+        v->arch.pv_vcpu.gdt_frames[i] = 0;
+    }
+}
+
+long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries)
+{
+    struct domain *d = v->domain;
+    l1_pgentry_t *pl1e;
+    /* NB. There are 512 8-byte entries per GDT page. */
+    unsigned int i, nr_pages = (entries + 511) / 512;
+
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    /* Check the pages in the new GDT. */
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        struct page_info *page;
+
+        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
+        if ( !page )
+            goto fail;
+        if ( !get_page_type(page, PGT_seg_desc_page) )
+        {
+            put_page(page);
+            goto fail;
+        }
+        frames[i] = mfn_x(page_to_mfn(page));
+    }
+
+    /* Tear down the old GDT. */
+    pv_destroy_gdt(v);
+
+    /* Install the new GDT. */
+    v->arch.pv_vcpu.gdt_ents = entries;
+    pl1e = pv_gdt_ptes(v);
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
+        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
+    }
+
+    return 0;
+
+ fail:
+    while ( i-- > 0 )
+    {
+        put_page_and_type(mfn_to_page(_mfn(frames[i])));
+    }
+    return -EINVAL;
+}
+
+long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
+                unsigned int entries)
+{
+    int nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    struct vcpu *curr = current;
+    long ret;
+
+    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( copy_from_guest(frames, frame_list, nr_pages) )
+        return -EFAULT;
+
+    domain_lock(curr->domain);
+
+    if ( (ret = pv_set_gdt(curr, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(curr->domain);
+
+    return ret;
+}
+
+long do_update_descriptor(uint64_t pa, uint64_t desc)
+{
+    struct domain *currd = current->domain;
+    unsigned long gmfn = pa >> PAGE_SHIFT;
+    unsigned long mfn;
+    unsigned int  offset;
+    struct desc_struct *gdt_pent, d;
+    struct page_info *page;
+    long ret = -EINVAL;
+
+    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+
+    *(uint64_t *)&d = desc;
+
+    page = get_page_from_gfn(currd, gmfn, NULL, P2M_ALLOC);
+    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
+         !page ||
+         !check_descriptor(currd, &d) )
+    {
+        if ( page )
+            put_page(page);
+        return -EINVAL;
+    }
+    mfn = mfn_x(page_to_mfn(page));
+
+    /* Check if the given frame is in use in an unsafe context. */
+    switch ( page->u.inuse.type_info & PGT_type_mask )
+    {
+    case PGT_seg_desc_page:
+        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
+            goto out;
+        break;
+    default:
+        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+            goto out;
+        break;
+    }
+
+    paging_mark_dirty(currd, _mfn(mfn));
+
+    /* All is good so make the update. */
+    gdt_pent = map_domain_page(_mfn(mfn));
+    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
+    unmap_domain_page(gdt_pent);
+
+    put_page_type(page);
+
+    ret = 0; /* success */
+
+ out:
+    put_page(page);
+
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 11/23] x86/mm: move compat descriptor table manipulation code
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (9 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:09   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page Wei Liu
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move them alongside the non-compat variants.

Change u{32,64} to uint{32,64}_t while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/descriptor-tables.c | 40 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/mm.c     | 39 ------------------------------------
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
index 04fb37f2ce..52ffaa88d8 100644
--- a/xen/arch/x86/pv/descriptor-tables.c
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -181,6 +181,46 @@ long do_update_descriptor(uint64_t pa, uint64_t desc)
     return ret;
 }
 
+int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
+{
+    unsigned int i, nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    long ret;
+
+    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(frame_list, nr_pages) )
+        return -EFAULT;
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unsigned int frame;
+
+        if ( __copy_from_guest(&frame, frame_list, 1) )
+            return -EFAULT;
+        frames[i] = frame;
+        guest_handle_add_offset(frame_list, 1);
+    }
+
+    domain_lock(current->domain);
+
+    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(current->domain);
+
+    return ret;
+}
+
+int compat_update_descriptor(uint32_t pa_lo, uint32_t pa_hi,
+                             uint32_t desc_lo, uint32_t desc_hi)
+{
+    return do_update_descriptor(pa_lo | ((uint64_t)pa_hi << 32),
+                                desc_lo | ((uint64_t)desc_hi << 32));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/mm.c b/xen/arch/x86/x86_64/compat/mm.c
index 16ea2a80df..c2aa6f2fdb 100644
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -8,45 +8,6 @@
 
 #include <asm/pv/mm.h>
 
-int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
-{
-    unsigned int i, nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    long ret;
-
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    if ( !guest_handle_okay(frame_list, nr_pages) )
-        return -EFAULT;
-
-    for ( i = 0; i < nr_pages; ++i )
-    {
-        unsigned int frame;
-
-        if ( __copy_from_guest(&frame, frame_list, 1) )
-            return -EFAULT;
-        frames[i] = frame;
-        guest_handle_add_offset(frame_list, 1);
-    }
-
-    domain_lock(current->domain);
-
-    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
-        flush_tlb_local();
-
-    domain_unlock(current->domain);
-
-    return ret;
-}
-
-int compat_update_descriptor(u32 pa_lo, u32 pa_hi, u32 desc_lo, u32 desc_hi)
-{
-    return do_update_descriptor(pa_lo | ((u64)pa_hi << 32),
-                                desc_lo | ((u64)desc_hi << 32));
-}
-
 int compat_arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct compat_machphys_mfn_list xmml;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (10 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 11/23] x86/mm: move compat " Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:18   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory Wei Liu
                   ` (10 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Add pv prefix to it. Move it to pv/mm.c. Fix call sites.

Take the chance to change v to curr and d to currd.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 73 -------------------------------------------
 xen/arch/x86/pv/mm.c        | 75 +++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c        |  4 +--
 xen/include/asm-x86/mm.h    |  2 --
 xen/include/asm-x86/pv/mm.h |  4 +++
 5 files changed, 81 insertions(+), 77 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index bfdba34468..8e25d15631 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -526,27 +526,6 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
-{
-    struct vcpu *curr = current;
-    const bool user_mode = !(curr->arch.flags & TF_kernel_mode);
-    l1_pgentry_t l1e;
-
-    if ( user_mode )
-        toggle_guest_mode(curr);
-
-    l1e = guest_get_eff_l1e(linear);
-
-    if ( user_mode )
-        toggle_guest_mode(curr);
-
-    return l1e;
-}
-
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
     /*
@@ -615,58 +594,6 @@ static int alloc_segdesc_page(struct page_info *page)
     return i == 512 ? 0 : -EINVAL;
 }
 
-
-/*
- * Map a guest's LDT page (covering the byte at @offset from start of the LDT)
- * into Xen's virtual range.  Returns true if the mapping changed, false
- * otherwise.
- */
-bool map_ldt_shadow_page(unsigned int offset)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t gl1e, *pl1e;
-    unsigned long linear = v->arch.pv_vcpu.ldt_base + offset;
-
-    BUG_ON(unlikely(in_irq()));
-
-    /*
-     * Hardware limit checking should guarantee this property.  NB. This is
-     * safe as updates to the LDT can only be made by MMUEXT_SET_LDT to the
-     * current vcpu, and vcpu_reset() will block until this vcpu has been
-     * descheduled before continuing.
-     */
-    ASSERT((offset >> 3) <= v->arch.pv_vcpu.ldt_ents);
-
-    if ( is_pv_32bit_domain(d) )
-        linear = (uint32_t)linear;
-
-    gl1e = guest_get_eff_kern_l1e(linear);
-    if ( unlikely(!(l1e_get_flags(gl1e) & _PAGE_PRESENT)) )
-        return false;
-
-    page = get_page_from_gfn(d, l1e_get_pfn(gl1e), NULL, P2M_ALLOC);
-    if ( unlikely(!page) )
-        return false;
-
-    if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
-    {
-        put_page(page);
-        return false;
-    }
-
-    pl1e = &pv_ldt_ptes(v)[offset >> PAGE_SHIFT];
-    l1e_add_flags(gl1e, _PAGE_RW);
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-    l1e_write(pl1e, gl1e);
-    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    return true;
-}
-
 static int get_page_and_type_from_mfn(
     mfn_t mfn, unsigned long type, struct domain *d,
     int partial, int preemptible)
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 4bfa322788..6890e80efd 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -22,6 +22,9 @@
 #include <xen/guest_access.h>
 
 #include <asm/current.h>
+#include <asm/p2m.h>
+
+#include "mm.h"
 
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
@@ -58,6 +61,78 @@ l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn)
     return (l1_pgentry_t *)map_domain_page(*gl1mfn) + l1_table_offset(linear);
 }
 
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static l1_pgentry_t guest_get_eff_kern_l1e(unsigned long linear)
+{
+    struct vcpu *curr = current;
+    const bool user_mode = !(curr->arch.flags & TF_kernel_mode);
+    l1_pgentry_t l1e;
+
+    if ( user_mode )
+        toggle_guest_mode(curr);
+
+    l1e = guest_get_eff_l1e(linear);
+
+    if ( user_mode )
+        toggle_guest_mode(curr);
+
+    return l1e;
+}
+
+/*
+ * Map a guest's LDT page (covering the byte at @offset from start of the LDT)
+ * into Xen's virtual range.  Returns true if the mapping changed, false
+ * otherwise.
+ */
+bool pv_map_ldt_shadow_page(unsigned int offset)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct page_info *page;
+    l1_pgentry_t gl1e, *pl1e;
+    unsigned long linear = curr->arch.pv_vcpu.ldt_base + offset;
+
+    BUG_ON(unlikely(in_irq()));
+
+    /*
+     * Hardware limit checking should guarantee this property.  NB. This is
+     * safe as updates to the LDT can only be made by MMUEXT_SET_LDT to the
+     * current vcpu, and vcpu_reset() will block until this vcpu has been
+     * descheduled before continuing.
+     */
+    ASSERT((offset >> 3) <= curr->arch.pv_vcpu.ldt_ents);
+
+    if ( is_pv_32bit_domain(currd) )
+        linear = (uint32_t)linear;
+
+    gl1e = guest_get_eff_kern_l1e(linear);
+    if ( unlikely(!(l1e_get_flags(gl1e) & _PAGE_PRESENT)) )
+        return false;
+
+    page = get_page_from_gfn(currd, l1e_get_pfn(gl1e), NULL, P2M_ALLOC);
+    if ( unlikely(!page) )
+        return false;
+
+    if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
+    {
+        put_page(page);
+        return false;
+    }
+
+    pl1e = &pv_ldt_ptes(curr)[offset >> PAGE_SHIFT];
+    l1e_add_flags(gl1e, _PAGE_RW);
+
+    spin_lock(&curr->arch.pv_vcpu.shadow_ldt_lock);
+    l1e_write(pl1e, gl1e);
+    curr->arch.pv_vcpu.shadow_ldt_mapcnt++;
+    spin_unlock(&curr->arch.pv_vcpu.shadow_ldt_lock);
+
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index d84db4acda..d8feef2942 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1101,7 +1101,7 @@ static int handle_gdt_ldt_mapping_fault(unsigned long offset,
     /*
      * If the fault is in another vcpu's area, it cannot be due to
      * a GDT/LDT descriptor load. Thus we can reasonably exit immediately, and
-     * indeed we have to since map_ldt_shadow_page() works correctly only on
+     * indeed we have to since pv_map_ldt_shadow_page() works correctly only on
      * accesses to a vcpu's own area.
      */
     if ( vcpu_area != curr->vcpu_id )
@@ -1113,7 +1113,7 @@ static int handle_gdt_ldt_mapping_fault(unsigned long offset,
     if ( likely(is_ldt_area) )
     {
         /* LDT fault: Copy a mapping from the guest's LDT, if it is valid. */
-        if ( likely(map_ldt_shadow_page(offset)) )
+        if ( likely(pv_map_ldt_shadow_page(offset)) )
         {
             if ( guest_mode(regs) )
                 trace_trap_two_addr(TRC_PV_GDT_LDT_MAPPING_FAULT,
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index a48d75d434..8a56bed454 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -562,8 +562,6 @@ long subarch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
 int compat_arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void));
 int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void));
 
-bool map_ldt_shadow_page(unsigned int);
-
 #define NIL(type) ((type *)-sizeof(type))
 #define IS_NIL(ptr) (!((uintptr_t)(ptr) + sizeof(*(ptr))))
 
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 3ca24cc70a..47223e38eb 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -28,6 +28,8 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
 long pv_set_gdt(struct vcpu *d, unsigned long *frames, unsigned int entries);
 void pv_destroy_gdt(struct vcpu *d);
 
+bool pv_map_ldt_shadow_page(unsigned int off);
+
 #else
 
 #include <xen/errno.h>
@@ -43,6 +45,8 @@ static inline long pv_set_gdt(struct vcpu *d, unsigned long *frames,
 { return -EINVAL; }
 static inline void pv_destroy_gdt(struct vcpu *d) {}
 
+static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (11 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:20   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 14/23] x86/mm: move PV l4 table setup code Wei Liu
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the split l4 setup code into the new function. It can then be
moved to pv/ in a later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 73 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8e25d15631..93ca075698 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -249,6 +249,45 @@ static l4_pgentry_t __read_mostly split_l4e;
 #define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
 #endif
 
+static void pv_arch_init_memory(void)
+{
+#ifndef NDEBUG
+    unsigned int i;
+
+    if ( highmem_start )
+    {
+        unsigned long split_va = (unsigned long)__va(highmem_start);
+
+        if ( split_va < HYPERVISOR_VIRT_END &&
+             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
+        {
+            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
+                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
+            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
+            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
+            {
+                l3_pgentry_t *l3tab = alloc_xen_pagetable();
+
+                if ( l3tab )
+                {
+                    const l3_pgentry_t *l3idle =
+                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
+
+                    for ( i = 0; i < l3_table_offset(split_va); ++i )
+                        l3tab[i] = l3idle[i];
+                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
+                        l3tab[i] = l3e_empty();
+                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
+                                             __PAGE_HYPERVISOR_RW);
+                }
+                else
+                    ++root_pgt_pv_xen_slots;
+            }
+        }
+    }
+#endif
+}
+
 void __init arch_init_memory(void)
 {
     unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
@@ -344,39 +383,7 @@ void __init arch_init_memory(void)
 
     mem_sharing_init();
 
-#ifndef NDEBUG
-    if ( highmem_start )
-    {
-        unsigned long split_va = (unsigned long)__va(highmem_start);
-
-        if ( split_va < HYPERVISOR_VIRT_END &&
-             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
-        {
-            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
-                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
-            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
-            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
-            {
-                l3_pgentry_t *l3tab = alloc_xen_pagetable();
-
-                if ( l3tab )
-                {
-                    const l3_pgentry_t *l3idle =
-                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
-
-                    for ( i = 0; i < l3_table_offset(split_va); ++i )
-                        l3tab[i] = l3idle[i];
-                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
-                        l3tab[i] = l3e_empty();
-                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
-                                             __PAGE_HYPERVISOR_RW);
-                }
-                else
-                    ++root_pgt_pv_xen_slots;
-            }
-        }
-    }
-#endif
+    pv_arch_init_memory();
 }
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 14/23] x86/mm: move PV l4 table setup code
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (12 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:23   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h Wei Liu
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/mm.c. Export pv_arch_init_memory via global
pv/mm.h and init_guest_l4_table via local pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c            | 82 +-------------------------------------------
 xen/arch/x86/pv/dom0_build.c |  2 ++
 xen/arch/x86/pv/domain.c     |  5 +++
 xen/arch/x86/pv/mm.c         | 82 ++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/mm.h         |  3 ++
 xen/include/asm-x86/mm.h     |  2 --
 xen/include/asm-x86/pv/mm.h  |  4 +++
 7 files changed, 97 insertions(+), 83 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 93ca075698..3a919c19b8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -125,6 +125,7 @@
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
+#include <asm/pv/mm.h>
 
 #include "pv/mm.h"
 
@@ -241,53 +242,6 @@ void __init init_frametable(void)
     memset(end_pg, -1, (unsigned long)top_pg - (unsigned long)end_pg);
 }
 
-#ifndef NDEBUG
-static unsigned int __read_mostly root_pgt_pv_xen_slots
-    = ROOT_PAGETABLE_PV_XEN_SLOTS;
-static l4_pgentry_t __read_mostly split_l4e;
-#else
-#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
-#endif
-
-static void pv_arch_init_memory(void)
-{
-#ifndef NDEBUG
-    unsigned int i;
-
-    if ( highmem_start )
-    {
-        unsigned long split_va = (unsigned long)__va(highmem_start);
-
-        if ( split_va < HYPERVISOR_VIRT_END &&
-             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
-        {
-            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
-                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
-            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
-            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
-            {
-                l3_pgentry_t *l3tab = alloc_xen_pagetable();
-
-                if ( l3tab )
-                {
-                    const l3_pgentry_t *l3idle =
-                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
-
-                    for ( i = 0; i < l3_table_offset(split_va); ++i )
-                        l3tab[i] = l3idle[i];
-                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
-                        l3tab[i] = l3e_empty();
-                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
-                                             __PAGE_HYPERVISOR_RW);
-                }
-                else
-                    ++root_pgt_pv_xen_slots;
-            }
-        }
-    }
-#endif
-}
-
 void __init arch_init_memory(void)
 {
     unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
@@ -1424,40 +1378,6 @@ static int alloc_l3_table(struct page_info *page)
     return rc > 0 ? 0 : rc;
 }
 
-/*
- * This function must write all ROOT_PAGETABLE_PV_XEN_SLOTS, to clobber any
- * values a guest may have left there from alloc_l4_table().
- */
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
-                         bool zap_ro_mpt)
-{
-    /* Xen private mappings. */
-    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
-#ifndef NDEBUG
-    if ( unlikely(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS) )
-    {
-        l4_pgentry_t *next = &l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT +
-                                    root_pgt_pv_xen_slots];
-
-        if ( l4e_get_intpte(split_l4e) )
-            *next++ = split_l4e;
-
-        memset(next, 0,
-               _p(&l4tab[ROOT_PAGETABLE_LAST_XEN_SLOT + 1]) - _p(next));
-    }
-#else
-    BUILD_BUG_ON(root_pgt_pv_xen_slots != ROOT_PAGETABLE_PV_XEN_SLOTS);
-#endif
-    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
-        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR_RW);
-    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
-        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
-    if ( zap_ro_mpt || is_pv_32bit_domain(d) )
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-}
-
 bool fill_ro_mpt(mfn_t mfn)
 {
     l4_pgentry_t *l4tab = map_domain_page(mfn);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index ec7f96d066..dcbee43e8f 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -20,6 +20,8 @@
 #include <asm/page.h>
 #include <asm/setup.h>
 
+#include "mm.h"
+
 /* Allow ring-3 access in long mode as guest cannot use ring 1 ... */
 #define BASE_PROT (_PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_USER)
 #define L1_PROT (BASE_PROT|_PAGE_GUEST_KERNEL)
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index c8b9cb645b..90d5569be1 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -9,8 +9,13 @@
 #include <xen/lib.h>
 #include <xen/sched.h>
 
+#include <asm/p2m.h>
+#include <asm/paging.h>
+#include <asm/setup.h>
 #include <asm/pv/domain.h>
 
+#include "mm.h"
+
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
 #define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 6890e80efd..d0fc14dfa6 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -23,6 +23,7 @@
 
 #include <asm/current.h>
 #include <asm/p2m.h>
+#include <asm/setup.h>
 
 #include "mm.h"
 
@@ -32,6 +33,14 @@
 #undef page_to_mfn
 #define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
 
+#ifndef NDEBUG
+static unsigned int __read_mostly root_pgt_pv_xen_slots
+    = ROOT_PAGETABLE_PV_XEN_SLOTS;
+static l4_pgentry_t __read_mostly split_l4e;
+#else
+#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
+#endif
+
 /*
  * Get a mapping of a PV guest's l1e for this linear address.  The return
  * pointer should be unmapped using unmap_domain_page().
@@ -133,6 +142,79 @@ bool pv_map_ldt_shadow_page(unsigned int offset)
     return true;
 }
 
+/*
+ * This function must write all ROOT_PAGETABLE_PV_XEN_SLOTS, to clobber any
+ * values a guest may have left there from alloc_l4_table().
+ */
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool zap_ro_mpt)
+{
+    /* Xen private mappings. */
+    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
+#ifndef NDEBUG
+    if ( unlikely(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS) )
+    {
+        l4_pgentry_t *next = &l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT +
+                                    root_pgt_pv_xen_slots];
+
+        if ( l4e_get_intpte(split_l4e) )
+            *next++ = split_l4e;
+
+        memset(next, 0,
+               _p(&l4tab[ROOT_PAGETABLE_LAST_XEN_SLOT + 1]) - _p(next));
+    }
+#else
+    BUILD_BUG_ON(root_pgt_pv_xen_slots != ROOT_PAGETABLE_PV_XEN_SLOTS);
+#endif
+    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
+        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR_RW);
+    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
+        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
+    if ( zap_ro_mpt || is_pv_32bit_domain(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+}
+
+void pv_arch_init_memory(void)
+{
+#ifndef NDEBUG
+    unsigned int i;
+
+    if ( highmem_start )
+    {
+        unsigned long split_va = (unsigned long)__va(highmem_start);
+
+        if ( split_va < HYPERVISOR_VIRT_END &&
+             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
+        {
+            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
+                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
+            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
+            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
+            {
+                l3_pgentry_t *l3tab = alloc_xen_pagetable();
+
+                if ( l3tab )
+                {
+                    const l3_pgentry_t *l3idle =
+                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
+
+                    for ( i = 0; i < l3_table_offset(split_va); ++i )
+                        l3tab[i] = l3idle[i];
+                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
+                        l3tab[i] = l3e_empty();
+                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
+                                             __PAGE_HYPERVISOR_RW);
+                }
+                else
+                    ++root_pgt_pv_xen_slots;
+            }
+        }
+    }
+#endif
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index acef061acf..a641964949 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -3,6 +3,9 @@
 
 l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn);
 
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                         bool zap_ro_mpt);
+
 /* Read a PV guest's l1e that maps this linear address. */
 static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
 {
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8a56bed454..e5087e11e5 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -329,8 +329,6 @@ static inline void *__page_to_virt(const struct page_info *pg)
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
-void init_guest_l4_table(l4_pgentry_t[], const struct domain *,
-                         bool_t zap_ro_mpt);
 bool fill_ro_mpt(mfn_t mfn);
 void zap_ro_mpt(mfn_t mfn);
 
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 47223e38eb..4944a70c7a 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -30,6 +30,8 @@ void pv_destroy_gdt(struct vcpu *d);
 
 bool pv_map_ldt_shadow_page(unsigned int off);
 
+void pv_arch_init_memory(void);
+
 #else
 
 #include <xen/errno.h>
@@ -47,6 +49,8 @@ static inline void pv_destroy_gdt(struct vcpu *d) {}
 
 static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
 
+static inline void pv_arch_init_memory(void) {}
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (13 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 14/23] x86/mm: move PV l4 table setup code Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:23   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type Wei Liu
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

It is only used by PV. The code can only be moved together with other
PV mm code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/emul-priv-op.c | 1 +
 xen/arch/x86/pv/mm.h           | 2 ++
 xen/include/asm-x86/mm.h       | 1 -
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 6dec822237..b5599c1869 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -41,6 +41,7 @@
 
 #include "../x86_64/mmconfig.h"
 #include "emulate.h"
+#include "mm.h"
 
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index a641964949..43e797f201 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -6,6 +6,8 @@ l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn);
 void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
                          bool zap_ro_mpt);
 
+int new_guest_cr3(mfn_t mfn);
+
 /* Read a PV guest's l1e that maps this linear address. */
 static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
 {
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index e5087e11e5..f2e0f498c4 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -546,7 +546,6 @@ void audit_domains(void);
 
 #endif
 
-int new_guest_cr3(mfn_t mfn);
 void make_cr3(struct vcpu *v, mfn_t mfn);
 void update_cr3(struct vcpu *v);
 int vcpu_destroy_pagetables(struct vcpu *);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (14 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:40   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h Wei Liu
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

And move the declarations to pv/mm.h. The code will be moved later.

The stubs contain BUG() because they aren't supposed to be called when
PV is disabled.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c       |  2 +-
 xen/arch/x86/mm.c           | 14 +++++++-------
 xen/include/asm-x86/mm.h    |  3 ---
 xen/include/asm-x86/pv/mm.h | 12 ++++++++++++
 4 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index e9367bd8aa..3abd37e4dc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1812,7 +1812,7 @@ static int relinquish_memory(
             if ( likely(y == x) )
             {
                 /* No need for atomic update of type_info here: noone else updates it. */
-                switch ( ret = free_page_type(page, x, 1) )
+                switch ( ret = pv_free_page_type(page, x, 1) )
                 {
                 case 0:
                     break;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 3a919c19b8..86c7466fa0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1987,8 +1987,8 @@ static void get_page_light(struct page_info *page)
     while ( unlikely(y != x) );
 }
 
-static int alloc_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
+int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                       int preemptible)
 {
     struct domain *owner = page_get_owner(page);
     int rc;
@@ -2017,7 +2017,7 @@ static int alloc_page_type(struct page_info *page, unsigned long type,
         rc = alloc_segdesc_page(page);
         break;
     default:
-        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n",
+        printk("Bad type in %s %lx t=%" PRtype_info " c=%lx\n", __func__,
                type, page->u.inuse.type_info,
                page->count_info);
         rc = -EINVAL;
@@ -2061,8 +2061,8 @@ static int alloc_page_type(struct page_info *page, unsigned long type,
 }
 
 
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible)
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      int preemptible)
 {
     struct domain *owner = page_get_owner(page);
     unsigned long gmfn;
@@ -2119,7 +2119,7 @@ int free_page_type(struct page_info *page, unsigned long type,
 static int __put_final_page_type(
     struct page_info *page, unsigned long type, int preemptible)
 {
-    int rc = free_page_type(page, type, preemptible);
+    int rc = pv_free_page_type(page, type, preemptible);
 
     /* No need for atomic update of type_info here: noone else updates it. */
     if ( rc == 0 )
@@ -2337,7 +2337,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
             page->nr_validated_ptes = 0;
             page->partial_pte = 0;
         }
-        rc = alloc_page_type(page, type, preemptible);
+        rc = pv_alloc_page_type(page, type, preemptible);
     }
 
     if ( (x & PGT_partial) && !(nx & PGT_partial) )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index f2e0f498c4..56b2b94195 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -326,9 +326,6 @@ static inline void *__page_to_virt(const struct page_info *pg)
                     (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg))));
 }
 
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible);
-
 bool fill_ro_mpt(mfn_t mfn);
 void zap_ro_mpt(mfn_t mfn);
 
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 4944a70c7a..0cd8beec39 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -32,6 +32,11 @@ bool pv_map_ldt_shadow_page(unsigned int off);
 
 void pv_arch_init_memory(void);
 
+int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                       int preemptible);
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      int preemptible);
+
 #else
 
 #include <xen/errno.h>
@@ -51,6 +56,13 @@ static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
 
 static inline void pv_arch_init_memory(void) {}
 
+static inline int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                                     int preemptible)
+{ BUG(); return -EINVAL; }
+static inline int pv_free_page_type(struct page_info *page, unsigned long type,
+                                    int preemptible)
+{ BUG(); return -EINVAL; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (15 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 13:52   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 18/23] x86/mm: export some stuff via local mm.h Wei Liu
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

The l1 mask needs to stay in x86/mm.c while l{2,3,4} masks are only
needed by PV code. Both x86 common mm code and PV mm code use
base_disallow_mask and l1 maks.

Export base_disallow_mask and l1 mask in asm-x86/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 12 +-----------
 xen/include/asm-x86/mm.h | 13 +++++++++++++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 86c7466fa0..e11aac3b90 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -152,9 +152,7 @@ bool __read_mostly machine_to_phys_mapping_valid;
 
 struct rangeset *__read_mostly mmio_ro_ranges;
 
-static uint32_t base_disallow_mask;
-/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
-#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+uint32_t __read_mostly base_disallow_mask;
 
 #define L2_DISALLOW_MASK base_disallow_mask
 
@@ -163,14 +161,6 @@ static uint32_t base_disallow_mask;
 
 #define L4_DISALLOW_MASK (base_disallow_mask)
 
-#define l1_disallow_mask(d)                                     \
-    ((d != dom_io) &&                                           \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
-
 static s8 __read_mostly opt_mmio_relax;
 
 static int __init parse_mmio_relax(const char *s)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 56b2b94195..fbb98e80c6 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -612,4 +612,17 @@ static inline bool arch_mfn_in_directmap(unsigned long mfn)
     return mfn <= (virt_to_mfn(eva - 1) + 1);
 }
 
+extern uint32_t base_disallow_mask;
+
+/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
+#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+
+#define l1_disallow_mask(d)                                     \
+    ((d != dom_io) &&                                           \
+     (rangeset_is_empty((d)->iomem_caps) &&                     \
+      rangeset_is_empty((d)->arch.ioport_caps) &&               \
+      !has_arch_pdevs(d) &&                                     \
+      is_pv_domain(d)) ?                                        \
+     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (16 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 15:44   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h Wei Liu
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

They will be used by PV mm code and mm hypercall code, which is going
to be split into two files.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 30 +++++++++++-------------------
 xen/arch/x86/pv/mm.h | 21 +++++++++++++++++++++
 2 files changed, 32 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index e11aac3b90..f9cc5a0f6f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -154,13 +154,6 @@ struct rangeset *__read_mostly mmio_ro_ranges;
 
 uint32_t __read_mostly base_disallow_mask;
 
-#define L2_DISALLOW_MASK base_disallow_mask
-
-#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
-                             base_disallow_mask : 0xFFFFF198U)
-
-#define L4_DISALLOW_MASK (base_disallow_mask)
-
 static s8 __read_mostly opt_mmio_relax;
 
 static int __init parse_mmio_relax(const char *s)
@@ -545,9 +538,8 @@ static int alloc_segdesc_page(struct page_info *page)
     return i == 512 ? 0 : -EINVAL;
 }
 
-static int get_page_and_type_from_mfn(
-    mfn_t mfn, unsigned long type, struct domain *d,
-    int partial, int preemptible)
+int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
+                               int partial, int preemptible)
 {
     struct page_info *page = mfn_to_page(mfn);
     int rc;
@@ -930,7 +922,7 @@ get_page_from_l1e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l2);
-static int
+int
 get_page_from_l2e(
     l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
 {
@@ -966,7 +958,7 @@ get_page_from_l2e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l3);
-static int
+int
 get_page_from_l3e(
     l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
 {
@@ -999,7 +991,7 @@ get_page_from_l3e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l4);
-static int
+int
 get_page_from_l4e(
     l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
 {
@@ -1087,7 +1079,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
  * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
  * Note also that this automatically deals correctly with linear p.t.'s.
  */
-static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
 {
     if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
         return 1;
@@ -1105,8 +1097,8 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
     return 0;
 }
 
-static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
-                             int partial, bool defer)
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer)
 {
     struct page_info *pg;
 
@@ -1143,8 +1135,8 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
     return put_page_and_type_preemptible(pg);
 }
 
-static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
-                             int partial, bool defer)
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer)
 {
     if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
          (l4e_get_pfn(l4e) != pfn) )
@@ -1206,7 +1198,7 @@ static int alloc_l1_table(struct page_info *page)
     return ret;
 }
 
-static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
 {
     struct page_info *page;
     l3_pgentry_t     l3e3;
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
index 43e797f201..b4bb214d95 100644
--- a/xen/arch/x86/pv/mm.h
+++ b/xen/arch/x86/pv/mm.h
@@ -1,6 +1,13 @@
 #ifndef __PV_MM_H__
 #define __PV_MM_H__
 
+#define L2_DISALLOW_MASK base_disallow_mask
+
+#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
+                             base_disallow_mask : 0xFFFFF198U)
+
+#define L4_DISALLOW_MASK (base_disallow_mask)
+
 l1_pgentry_t *map_guest_l1e(unsigned long linear, mfn_t *gl1mfn);
 
 void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
@@ -8,6 +15,8 @@ void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
 
 int new_guest_cr3(mfn_t mfn);
 
+int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e);
+
 /* Read a PV guest's l1e that maps this linear address. */
 static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
 {
@@ -152,4 +161,16 @@ static inline l4_pgentry_t adjust_guest_l4e(l4_pgentry_t l4e,
     return l4e;
 }
 
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn);
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer);
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer);
+int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
+                               int partial, int preemptible);
 #endif /* __PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (17 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 18/23] x86/mm: export some stuff via local mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 15:49   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c Wei Liu
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

It is going to be needed by common x86 mm code and pv mm code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 +-
 xen/include/asm-x86/mm.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index f9cc5a0f6f..c3a26fe03d 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1955,7 +1955,7 @@ int get_page(struct page_info *page, struct domain *domain)
  *   acquired reference again.
  * Due to get_page() reserving one reference, this call cannot fail.
  */
-static void get_page_light(struct page_info *page)
+void get_page_light(struct page_info *page)
 {
     unsigned long x, nx, y = page->count_info;
 
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index fbb98e80c6..1f62155d56 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -363,6 +363,7 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+void get_page_light(struct page_info *page);
 
 static inline bool get_page_from_mfn(mfn_t mfn, struct domain *d)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (18 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-22 15:53   ` Jan Beulich
  2017-09-14 12:58 ` [PATCH v5 21/23] x86/mm: move and add pv prefix to invalidate_shadow_ldt Wei Liu
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

A load of functions are moved:

1. {get,put}_page_from_l{2,3,4}e
2. pv_{alloc,free}_page_type
3. all helpers for the above

The l1e functions can't be moved because they are needed by shadow
code as well.

Fix coding style issues while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 797 ---------------------------------------------------
 xen/arch/x86/pv/mm.c | 790 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 790 insertions(+), 797 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c3a26fe03d..3f8d22650d 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -523,108 +523,6 @@ static void invalidate_shadow_ldt(struct vcpu *v, int flush)
 }
 
 
-static int alloc_segdesc_page(struct page_info *page)
-{
-    const struct domain *owner = page_get_owner(page);
-    struct desc_struct *descs = __map_domain_page(page);
-    unsigned i;
-
-    for ( i = 0; i < 512; i++ )
-        if ( unlikely(!check_descriptor(owner, &descs[i])) )
-            break;
-
-    unmap_domain_page(descs);
-
-    return i == 512 ? 0 : -EINVAL;
-}
-
-int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
-                               int partial, int preemptible)
-{
-    struct page_info *page = mfn_to_page(mfn);
-    int rc;
-
-    if ( likely(partial >= 0) &&
-         unlikely(!get_page_from_mfn(mfn, d)) )
-        return -EINVAL;
-
-    rc = (preemptible ?
-          get_page_type_preemptible(page, type) :
-          (get_page_type(page, type) ? 0 : -EINVAL));
-
-    if ( unlikely(rc) && partial >= 0 &&
-         (!preemptible || page != current->arch.old_guest_table) )
-        put_page(page);
-
-    return rc;
-}
-
-static void put_data_page(
-    struct page_info *page, int writeable)
-{
-    if ( writeable )
-        put_page_and_type(page);
-    else
-        put_page(page);
-}
-
-/*
- * We allow root tables to map each other (a.k.a. linear page tables). It
- * needs some special care with reference counts and access permissions:
- *  1. The mapping entry must be read-only, or the guest may get write access
- *     to its own PTEs.
- *  2. We must only bump the reference counts for an *already validated*
- *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
- *     on a validation that is required to complete that validation.
- *  3. We only need to increment the reference counts for the mapped page
- *     frame if it is mapped by a different root table. This is sufficient and
- *     also necessary to allow validation of a root table mapping itself.
- */
-#define define_get_linear_pagetable(level)                                  \
-static int                                                                  \
-get_##level##_linear_pagetable(                                             \
-    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
-{                                                                           \
-    unsigned long x, y;                                                     \
-    struct page_info *page;                                                 \
-    unsigned long pfn;                                                      \
-                                                                            \
-    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
-    {                                                                       \
-        gdprintk(XENLOG_WARNING,                                            \
-                 "Attempt to create linear p.t. with write perms\n");       \
-        return 0;                                                           \
-    }                                                                       \
-                                                                            \
-    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
-    {                                                                       \
-        /* Make sure the mapped frame belongs to the correct domain. */     \
-        if ( unlikely(!get_page_from_mfn(_mfn(pfn), d)) )                   \
-            return 0;                                                       \
-                                                                            \
-        /*                                                                  \
-         * Ensure that the mapped frame is an already-validated page table. \
-         * If so, atomically increment the count (checking for overflow).   \
-         */                                                                 \
-        page = mfn_to_page(_mfn(pfn));                                      \
-        y = page->u.inuse.type_info;                                        \
-        do {                                                                \
-            x = y;                                                          \
-            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
-                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
-                          (PGT_##level##_page_table|PGT_validated)) )       \
-            {                                                               \
-                put_page(page);                                             \
-                return 0;                                                   \
-            }                                                               \
-        }                                                                   \
-        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
-    }                                                                       \
-                                                                            \
-    return 1;                                                               \
-}
-
-
 bool is_iomem_page(mfn_t mfn)
 {
     struct page_info *page;
@@ -913,108 +811,6 @@ get_page_from_l1e(
     return -EBUSY;
 }
 
-
-/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
-/*
- * get_page_from_l2e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l2);
-int
-get_page_from_l2e(
-    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
-{
-    unsigned long mfn = l2e_get_pfn(l2e);
-    int rc;
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
-    {
-        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0, 0);
-        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-            rc = 0;
-        return rc;
-    }
-
-    return -EINVAL;
-}
-
-
-/*
- * get_page_from_l3e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l3);
-int
-get_page_from_l3e(
-    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                 l3e_get_flags(l3e) & l3_disallow_mask(d));
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_mfn(
-        l3e_get_mfn(l3e), PGT_l2_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) &&
-         !is_pv_32bit_domain(d) &&
-         get_l3_linear_pagetable(l3e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-/*
- * get_page_from_l4e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l4);
-int
-get_page_from_l4e(
-    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_mfn(
-        l4e_get_mfn(l4e), PGT_l3_page_table, d, partial, 1);
-    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 {
     unsigned long     pfn = l1e_get_pfn(l1e);
@@ -1074,292 +870,6 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
     }
 }
 
-
-/*
- * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
- * Note also that this automatically deals correctly with linear p.t.'s.
- */
-int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
-{
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
-        return 1;
-
-    if ( l2e_get_flags(l2e) & _PAGE_PSE )
-    {
-        struct page_info *page = l2e_get_page(l2e);
-        unsigned int i;
-
-        for ( i = 0; i < (1u << PAGETABLE_ORDER); i++, page++ )
-            put_page_and_type(page);
-    } else
-        put_page_and_type(l2e_get_page(l2e));
-
-    return 0;
-}
-
-int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
-                      bool defer)
-{
-    struct page_info *pg;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
-        return 1;
-
-    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
-    {
-        unsigned long mfn = l3e_get_pfn(l3e);
-        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
-
-        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
-        do {
-            put_data_page(mfn_to_page(_mfn(mfn)), writeable);
-        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
-
-        return 0;
-    }
-
-    pg = l3e_get_page(l3e);
-
-    if ( unlikely(partial > 0) )
-    {
-        ASSERT(!defer);
-        return put_page_type_preemptible(pg);
-    }
-
-    if ( defer )
-    {
-        current->arch.old_guest_table = pg;
-        return 0;
-    }
-
-    return put_page_and_type_preemptible(pg);
-}
-
-int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
-                      bool defer)
-{
-    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
-         (l4e_get_pfn(l4e) != pfn) )
-    {
-        struct page_info *pg = l4e_get_page(l4e);
-
-        if ( unlikely(partial > 0) )
-        {
-            ASSERT(!defer);
-            return put_page_type_preemptible(pg);
-        }
-
-        if ( defer )
-        {
-            current->arch.old_guest_table = pg;
-            return 0;
-        }
-
-        return put_page_and_type_preemptible(pg);
-    }
-    return 1;
-}
-
-static int alloc_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    l1_pgentry_t  *pl1e;
-    unsigned int   i;
-    int            ret = 0;
-
-    pl1e = __map_domain_page(page);
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-    {
-        switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
-        {
-        default:
-            goto fail;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(pl1e[i], ret);
-            break;
-        }
-
-        pl1e[i] = adjust_guest_l1e(pl1e[i], d);
-    }
-
-    unmap_domain_page(pl1e);
-    return 0;
-
- fail:
-    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
-    while ( i-- > 0 )
-        put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-    return ret;
-}
-
-int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
-{
-    struct page_info *page;
-    l3_pgentry_t     l3e3;
-
-    if ( !is_pv_32bit_domain(d) )
-        return 1;
-
-    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
-
-    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
-    l3e3 = pl3e[3];
-    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
-        return 0;
-    }
-
-    /*
-     * The Xen-private mappings include linear mappings. The L2 thus cannot
-     * be shared by multiple L3 tables. The test here is adequate because:
-     *  1. Cannot appear in slots != 3 because get_page_type() checks the
-     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
-     *  2. Cannot appear in another page table's L3:
-     *     a. alloc_l3_table() calls this function and this check will fail
-     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
-     */
-    page = l3e_get_page(l3e3);
-    BUG_ON(page->u.inuse.type_info & PGT_pinned);
-    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
-    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
-        return 0;
-    }
-
-    return 1;
-}
-
-static int alloc_l2_table(struct page_info *page, unsigned long type,
-                          int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = mfn_x(page_to_mfn(page));
-    l2_pgentry_t  *pl2e;
-    unsigned int   i;
-    int            rc = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
-    {
-        if ( preemptible && i > page->nr_validated_ptes
-             && hypercall_preempt_check() )
-        {
-            page->nr_validated_ptes = i;
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( !is_guest_l2_slot(d, type, i) ||
-             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
-            continue;
-
-        if ( rc < 0 )
-        {
-            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
-            while ( i-- > 0 )
-                if ( is_guest_l2_slot(d, type, i) )
-                    put_page_from_l2e(pl2e[i], pfn);
-            break;
-        }
-
-        pl2e[i] = adjust_guest_l2e(pl2e[i], d);
-    }
-
-    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
-    {
-        /* Xen private mappings. */
-        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
-               &compat_idle_pg_table_l2[
-                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
-               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
-    }
-
-    unmap_domain_page(pl2e);
-    return rc > 0 ? 0 : rc;
-}
-
-static int alloc_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = mfn_x(page_to_mfn(page));
-    l3_pgentry_t  *pl3e;
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    /*
-     * PAE guests allocate full pages, but aren't required to initialize
-     * more than the first four entries; when running in compatibility
-     * mode, however, the full page is visible to the MMU, and hence all
-     * 512 entries must be valid/verified, which is most easily achieved
-     * by clearing them out.
-     */
-    if ( is_pv_32bit_domain(d) )
-        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
-
-    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( is_pv_32bit_domain(d) && (i == 3) )
-        {
-            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
-                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
-                rc = -EINVAL;
-            else
-                rc = get_page_and_type_from_mfn(
-                    l3e_get_mfn(pl3e[i]),
-                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, 1);
-        }
-        else if ( (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc == -EINTR && i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            rc = -ERESTART;
-        }
-        if ( rc < 0 )
-            break;
-
-        pl3e[i] = adjust_guest_l3e(pl3e[i], d);
-    }
-
-    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
-        rc = -EINVAL;
-    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
-    {
-        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
-        if ( i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            current->arch.old_guest_table = page;
-        }
-        while ( i-- > 0 )
-            pl3e[i] = unadjust_guest_l3e(pl3e[i], d);
-    }
-
-    unmap_domain_page(pl3e);
-    return rc > 0 ? 0 : rc;
-}
-
 bool fill_ro_mpt(mfn_t mfn)
 {
     l4_pgentry_t *l4tab = map_domain_page(mfn);
@@ -1384,184 +894,6 @@ void zap_ro_mpt(mfn_t mfn)
     unmap_domain_page(l4tab);
 }
 
-static int alloc_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = mfn_x(page_to_mfn(page));
-    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( !is_guest_l4_slot(d, i) ||
-             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc < 0 )
-        {
-            if ( rc != -EINTR )
-                gdprintk(XENLOG_WARNING,
-                         "Failure in alloc_l4_table: slot %#x\n", i);
-            if ( i )
-            {
-                page->nr_validated_ptes = i;
-                page->partial_pte = 0;
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else
-                {
-                    if ( current->arch.old_guest_table )
-                        page->nr_validated_ptes++;
-                    current->arch.old_guest_table = page;
-                }
-            }
-        }
-        if ( rc < 0 )
-        {
-            unmap_domain_page(pl4e);
-            return rc;
-        }
-
-        pl4e[i] = adjust_guest_l4e(pl4e[i], d);
-    }
-
-    if ( rc >= 0 )
-    {
-        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
-        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-    unmap_domain_page(pl4e);
-
-    return rc;
-}
-
-static void free_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    l1_pgentry_t *pl1e;
-    unsigned int  i;
-
-    pl1e = __map_domain_page(page);
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-        put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-}
-
-
-static int free_l2_table(struct page_info *page, int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = mfn_x(page_to_mfn(page));
-    l2_pgentry_t *pl2e;
-    unsigned int  i = page->nr_validated_ptes - 1;
-    int err = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    ASSERT(page->nr_validated_ptes);
-    do {
-        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
-             put_page_from_l2e(pl2e[i], pfn) == 0 &&
-             preemptible && i && hypercall_preempt_check() )
-        {
-           page->nr_validated_ptes = i;
-           err = -ERESTART;
-        }
-    } while ( !err && i-- );
-
-    unmap_domain_page(pl2e);
-
-    if ( !err )
-        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
-
-    return err;
-}
-
-static int free_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = mfn_x(page_to_mfn(page));
-    l3_pgentry_t *pl3e;
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    do {
-        rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
-        if ( rc < 0 )
-            break;
-        partial = 0;
-        if ( rc > 0 )
-            continue;
-        pl3e[i] = unadjust_guest_l3e(pl3e[i], d);
-    } while ( i-- );
-
-    unmap_domain_page(pl3e);
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-    return rc > 0 ? 0 : rc;
-}
-
-static int free_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = mfn_x(page_to_mfn(page));
-    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    do {
-        if ( is_guest_l4_slot(d, i) )
-            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
-        if ( rc < 0 )
-            break;
-        partial = 0;
-    } while ( i-- );
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-
-    unmap_domain_page(pl4e);
-
-    if ( rc >= 0 )
-    {
-        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-
-    return rc;
-}
-
 int page_lock(struct page_info *page)
 {
     unsigned long x, nx;
@@ -1969,135 +1301,6 @@ void get_page_light(struct page_info *page)
     while ( unlikely(y != x) );
 }
 
-int pv_alloc_page_type(struct page_info *page, unsigned long type,
-                       int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    int rc;
-
-    /* A page table is dirtied when its type count becomes non-zero. */
-    if ( likely(owner != NULL) )
-        paging_mark_dirty(owner, page_to_mfn(page));
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        rc = alloc_l1_table(page);
-        break;
-    case PGT_l2_page_table:
-        rc = alloc_l2_table(page, type, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l4_table(page);
-        break;
-    case PGT_seg_desc_page:
-        rc = alloc_segdesc_page(page);
-        break;
-    default:
-        printk("Bad type in %s %lx t=%" PRtype_info " c=%lx\n", __func__,
-               type, page->u.inuse.type_info,
-               page->count_info);
-        rc = -EINVAL;
-        BUG();
-    }
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    smp_wmb();
-    switch ( rc )
-    {
-    case 0:
-        page->u.inuse.type_info |= PGT_validated;
-        break;
-    case -EINTR:
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        page->u.inuse.type_info &= ~PGT_count_mask;
-        break;
-    default:
-        ASSERT(rc < 0);
-        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
-                 " (pfn %" PRI_pfn ") for type %" PRtype_info
-                 ": caf=%08lx taf=%" PRtype_info "\n",
-                 mfn_x(page_to_mfn(page)),
-                 get_gpfn_from_mfn(mfn_x(page_to_mfn(page))),
-                 type, page->count_info, page->u.inuse.type_info);
-        if ( page != current->arch.old_guest_table )
-            page->u.inuse.type_info = 0;
-        else
-        {
-            ASSERT((page->u.inuse.type_info &
-                    (PGT_count_mask | PGT_validated)) == 1);
-    case -ERESTART:
-            get_page_light(page);
-            page->u.inuse.type_info |= PGT_partial;
-        }
-        break;
-    }
-
-    return rc;
-}
-
-
-int pv_free_page_type(struct page_info *page, unsigned long type,
-                      int preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    unsigned long gmfn;
-    int rc;
-
-    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
-    {
-        /* A page table is dirtied when its type count becomes zero. */
-        paging_mark_dirty(owner, page_to_mfn(page));
-
-        ASSERT(!shadow_mode_refcounts(owner));
-
-        gmfn = mfn_to_gmfn(owner, mfn_x(page_to_mfn(page)));
-        ASSERT(VALID_M2P(gmfn));
-        /* Page sharing not supported for shadowed domains */
-        if(!SHARED_M2P(gmfn))
-            shadow_remove_all_shadows(owner, _mfn(gmfn));
-    }
-
-    if ( !(type & PGT_partial) )
-    {
-        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
-        page->partial_pte = 0;
-    }
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        free_l1_table(page);
-        rc = 0;
-        break;
-    case PGT_l2_page_table:
-        rc = free_l2_table(page, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = free_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = free_l4_table(page);
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
-                 type, mfn_x(page_to_mfn(page)));
-        rc = -EINVAL;
-        BUG();
-    }
-
-    return rc;
-}
-
-
 static int __put_final_page_type(
     struct page_info *page, unsigned long type, int preemptible)
 {
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index d0fc14dfa6..47cdf58dcf 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -19,11 +19,14 @@
  * License along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/event.h>
 #include <xen/guest_access.h>
 
 #include <asm/current.h>
+#include <asm/mm.h>
 #include <asm/p2m.h>
 #include <asm/setup.h>
+#include <asm/shadow.h>
 
 #include "mm.h"
 
@@ -41,6 +44,273 @@ static l4_pgentry_t __read_mostly split_l4e;
 #define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
 #endif
 
+/*
+ * We allow root tables to map each other (a.k.a. linear page tables). It
+ * needs some special care with reference counts and access permissions:
+ *  1. The mapping entry must be read-only, or the guest may get write access
+ *     to its own PTEs.
+ *  2. We must only bump the reference counts for an *already validated*
+ *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
+ *     on a validation that is required to complete that validation.
+ *  3. We only need to increment the reference counts for the mapped page
+ *     frame if it is mapped by a different root table. This is sufficient and
+ *     also necessary to allow validation of a root table mapping itself.
+ */
+#define define_get_linear_pagetable(level)                                  \
+static int                                                                  \
+get_##level##_linear_pagetable(                                             \
+    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
+{                                                                           \
+    unsigned long x, y;                                                     \
+    struct page_info *page;                                                 \
+    unsigned long pfn;                                                      \
+                                                                            \
+    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
+    {                                                                       \
+        gdprintk(XENLOG_WARNING,                                            \
+                 "Attempt to create linear p.t. with write perms\n");       \
+        return 0;                                                           \
+    }                                                                       \
+                                                                            \
+    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
+    {                                                                       \
+        /* Make sure the mapped frame belongs to the correct domain. */     \
+        if ( unlikely(!get_page_from_mfn(_mfn(pfn), d)) )                   \
+            return 0;                                                       \
+                                                                            \
+        /*                                                                  \
+         * Ensure that the mapped frame is an already-validated page table. \
+         * If so, atomically increment the count (checking for overflow).   \
+         */                                                                 \
+        page = mfn_to_page(_mfn(pfn));                                      \
+        y = page->u.inuse.type_info;                                        \
+        do {                                                                \
+            x = y;                                                          \
+            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
+                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
+                          (PGT_##level##_page_table|PGT_validated)) )       \
+            {                                                               \
+                put_page(page);                                             \
+                return 0;                                                   \
+            }                                                               \
+        }                                                                   \
+        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
+    }                                                                       \
+                                                                            \
+    return 1;                                                               \
+}
+
+int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
+                               int partial, int preemptible)
+{
+    struct page_info *page = mfn_to_page(mfn);
+    int rc;
+
+    if ( likely(partial >= 0) &&
+         unlikely(!get_page_from_mfn(mfn, d)) )
+        return -EINVAL;
+
+    rc = (preemptible ?
+          get_page_type_preemptible(page, type) :
+          (get_page_type(page, type) ? 0 : -EINVAL));
+
+    if ( unlikely(rc) && partial >= 0 &&
+         (!preemptible || page != current->arch.old_guest_table) )
+        put_page(page);
+
+    return rc;
+}
+
+/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
+/*
+ * get_page_from_l2e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l2);
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+{
+    unsigned long mfn = l2e_get_pfn(l2e);
+    int rc;
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
+    {
+        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0, 0);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+        return rc;
+    }
+
+    return -EINVAL;
+}
+
+
+/*
+ * get_page_from_l3e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l3);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial)
+{
+    int rc;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                 l3e_get_flags(l3e) & l3_disallow_mask(d));
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_mfn(
+        l3e_get_mfn(l3e), PGT_l2_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) &&
+         !is_pv_32bit_domain(d) &&
+         get_l3_linear_pagetable(l3e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+/*
+ * get_page_from_l4e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l4);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial)
+{
+    int rc;
+
+    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_mfn(
+        l4e_get_mfn(l4e), PGT_l3_page_table, d, partial, 1);
+    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+/*
+ * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
+ * Note also that this automatically deals correctly with linear p.t.'s.
+ */
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+{
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
+        return 1;
+
+    if ( l2e_get_flags(l2e) & _PAGE_PSE )
+    {
+        struct page_info *page = l2e_get_page(l2e);
+        unsigned int i;
+
+        for ( i = 0; i < (1u << PAGETABLE_ORDER); i++, page++ )
+            put_page_and_type(page);
+    } else
+        put_page_and_type(l2e_get_page(l2e));
+
+    return 0;
+}
+
+static void put_data_page(struct page_info *page, bool writeable)
+{
+    if ( writeable )
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer)
+{
+    struct page_info *pg;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
+        return 1;
+
+    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
+    {
+        unsigned long mfn = l3e_get_pfn(l3e);
+        bool writeable = l3e_get_flags(l3e) & _PAGE_RW;
+
+        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
+        do {
+            put_data_page(mfn_to_page(_mfn(mfn)), writeable);
+        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
+
+        return 0;
+    }
+
+    pg = l3e_get_page(l3e);
+
+    if ( unlikely(partial > 0) )
+    {
+        ASSERT(!defer);
+        return put_page_type_preemptible(pg);
+    }
+
+    if ( defer )
+    {
+        current->arch.old_guest_table = pg;
+        return 0;
+    }
+
+    return put_page_and_type_preemptible(pg);
+}
+
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer)
+{
+    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
+         (l4e_get_pfn(l4e) != pfn) )
+    {
+        struct page_info *pg = l4e_get_page(l4e);
+
+        if ( unlikely(partial > 0) )
+        {
+            ASSERT(!defer);
+            return put_page_type_preemptible(pg);
+        }
+
+        if ( defer )
+        {
+            current->arch.old_guest_table = pg;
+            return 0;
+        }
+
+        return put_page_and_type_preemptible(pg);
+    }
+    return 1;
+}
+
 /*
  * Get a mapping of a PV guest's l1e for this linear address.  The return
  * pointer should be unmapped using unmap_domain_page().
@@ -215,6 +485,526 @@ void pv_arch_init_memory(void)
 #endif
 }
 
+static int alloc_segdesc_page(struct page_info *page)
+{
+    const struct domain *owner = page_get_owner(page);
+    struct desc_struct *descs = __map_domain_page(page);
+    unsigned i;
+
+    for ( i = 0; i < 512; i++ )
+        if ( unlikely(!check_descriptor(owner, &descs[i])) )
+            break;
+
+    unmap_domain_page(descs);
+
+    return i == 512 ? 0 : -EINVAL;
+}
+
+static int alloc_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    l1_pgentry_t  *pl1e;
+    unsigned int   i;
+    int            ret = 0;
+
+    pl1e = __map_domain_page(page);
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+    {
+        switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
+        {
+        default:
+            goto fail;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(pl1e[i], ret);
+            break;
+        }
+
+        pl1e[i] = adjust_guest_l1e(pl1e[i], d);
+    }
+
+    unmap_domain_page(pl1e);
+    return 0;
+
+ fail:
+    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
+    while ( i-- > 0 )
+        put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+    return ret;
+}
+
+int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+{
+    struct page_info *page;
+    l3_pgentry_t     l3e3;
+
+    if ( !is_pv_32bit_domain(d) )
+        return 1;
+
+    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
+
+    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
+    l3e3 = pl3e[3];
+    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
+        return 0;
+    }
+
+    /*
+     * The Xen-private mappings include linear mappings. The L2 thus cannot
+     * be shared by multiple L3 tables. The test here is adequate because:
+     *  1. Cannot appear in slots != 3 because get_page_type() checks the
+     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
+     *  2. Cannot appear in another page table's L3:
+     *     a. alloc_l3_table() calls this function and this check will fail
+     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
+     */
+    page = l3e_get_page(l3e3);
+    BUG_ON(page->u.inuse.type_info & PGT_pinned);
+    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
+    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
+        return 0;
+    }
+
+    return 1;
+}
+
+static int alloc_l2_table(struct page_info *page, unsigned long type,
+                          int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = mfn_x(page_to_mfn(page));
+    l2_pgentry_t  *pl2e;
+    unsigned int   i;
+    int            rc = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( preemptible && i > page->nr_validated_ptes
+             && hypercall_preempt_check() )
+        {
+            page->nr_validated_ptes = i;
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( !is_guest_l2_slot(d, type, i) ||
+             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
+            continue;
+
+        if ( rc < 0 )
+        {
+            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
+            while ( i-- > 0 )
+                if ( is_guest_l2_slot(d, type, i) )
+                    put_page_from_l2e(pl2e[i], pfn);
+            break;
+        }
+
+        pl2e[i] = adjust_guest_l2e(pl2e[i], d);
+    }
+
+    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    {
+        /* Xen private mappings. */
+        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
+               &compat_idle_pg_table_l2[
+                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
+               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+    }
+
+    unmap_domain_page(pl2e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = mfn_x(page_to_mfn(page));
+    l3_pgentry_t  *pl3e;
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    /*
+     * PAE guests allocate full pages, but aren't required to initialize
+     * more than the first four entries; when running in compatibility
+     * mode, however, the full page is visible to the MMU, and hence all
+     * 512 entries must be valid/verified, which is most easily achieved
+     * by clearing them out.
+     */
+    if ( is_pv_32bit_domain(d) )
+        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+
+    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( is_pv_32bit_domain(d) && (i == 3) )
+        {
+            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
+                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
+                rc = -EINVAL;
+            else
+                rc = get_page_and_type_from_mfn(
+                    l3e_get_mfn(pl3e[i]),
+                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, 1);
+        }
+        else if ( (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc == -EINTR && i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            rc = -ERESTART;
+        }
+        if ( rc < 0 )
+            break;
+
+        pl3e[i] = adjust_guest_l3e(pl3e[i], d);
+    }
+
+    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
+        rc = -EINVAL;
+    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
+    {
+        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
+        if ( i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            current->arch.old_guest_table = page;
+        }
+        while ( i-- > 0 )
+            pl3e[i] = unadjust_guest_l3e(pl3e[i], d);
+    }
+
+    unmap_domain_page(pl3e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = mfn_x(page_to_mfn(page));
+    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( !is_guest_l4_slot(d, i) ||
+             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc < 0 )
+        {
+            if ( rc != -EINTR )
+                gdprintk(XENLOG_WARNING,
+                         "Failure in alloc_l4_table: slot %#x\n", i);
+            if ( i )
+            {
+                page->nr_validated_ptes = i;
+                page->partial_pte = 0;
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else
+                {
+                    if ( current->arch.old_guest_table )
+                        page->nr_validated_ptes++;
+                    current->arch.old_guest_table = page;
+                }
+            }
+        }
+        if ( rc < 0 )
+        {
+            unmap_domain_page(pl4e);
+            return rc;
+        }
+
+        pl4e[i] = adjust_guest_l4e(pl4e[i], d);
+    }
+
+    if ( rc >= 0 )
+    {
+        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
+        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+    unmap_domain_page(pl4e);
+
+    return rc;
+}
+
+int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                       int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    int rc;
+
+    /* A page table is dirtied when its type count becomes non-zero. */
+    if ( likely(owner != NULL) )
+        paging_mark_dirty(owner, page_to_mfn(page));
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        rc = alloc_l1_table(page);
+        break;
+    case PGT_l2_page_table:
+        rc = alloc_l2_table(page, type, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l4_table(page);
+        break;
+    case PGT_seg_desc_page:
+        rc = alloc_segdesc_page(page);
+        break;
+    default:
+        printk("Bad type in %s %lx t=%" PRtype_info " c=%lx\n", __func__,
+               type, page->u.inuse.type_info,
+               page->count_info);
+        rc = -EINVAL;
+        BUG();
+    }
+
+    /* No need for atomic update of type_info here: noone else updates it. */
+    smp_wmb();
+    switch ( rc )
+    {
+    case 0:
+        page->u.inuse.type_info |= PGT_validated;
+        break;
+    case -EINTR:
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        page->u.inuse.type_info &= ~PGT_count_mask;
+        break;
+    default:
+        ASSERT(rc < 0);
+        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
+                 " (pfn %" PRI_pfn ") for type %" PRtype_info
+                 ": caf=%08lx taf=%" PRtype_info "\n",
+                 mfn_x(page_to_mfn(page)),
+                 get_gpfn_from_mfn(mfn_x(page_to_mfn(page))),
+                 type, page->count_info, page->u.inuse.type_info);
+        if ( page != current->arch.old_guest_table )
+            page->u.inuse.type_info = 0;
+        else
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_count_mask | PGT_validated)) == 1);
+    case -ERESTART:
+            get_page_light(page);
+            page->u.inuse.type_info |= PGT_partial;
+        }
+        break;
+    }
+
+    return rc;
+}
+
+static void free_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    l1_pgentry_t *pl1e;
+    unsigned int  i;
+
+    pl1e = __map_domain_page(page);
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+        put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+}
+
+
+static int free_l2_table(struct page_info *page, int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = mfn_x(page_to_mfn(page));
+    l2_pgentry_t *pl2e;
+    unsigned int  i = page->nr_validated_ptes - 1;
+    int err = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    ASSERT(page->nr_validated_ptes);
+    do {
+        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
+             put_page_from_l2e(pl2e[i], pfn) == 0 &&
+             preemptible && i && hypercall_preempt_check() )
+        {
+           page->nr_validated_ptes = i;
+           err = -ERESTART;
+        }
+    } while ( !err && i-- );
+
+    unmap_domain_page(pl2e);
+
+    if ( !err )
+        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
+
+    return err;
+}
+
+static int free_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = mfn_x(page_to_mfn(page));
+    l3_pgentry_t *pl3e;
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    do {
+        rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
+        if ( rc < 0 )
+            break;
+        partial = 0;
+        if ( rc > 0 )
+            continue;
+        pl3e[i] = unadjust_guest_l3e(pl3e[i], d);
+    } while ( i-- );
+
+    unmap_domain_page(pl3e);
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+    return rc > 0 ? 0 : rc;
+}
+
+static int free_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = mfn_x(page_to_mfn(page));
+    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    do {
+        if ( is_guest_l4_slot(d, i) )
+            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
+        if ( rc < 0 )
+            break;
+        partial = 0;
+    } while ( i-- );
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+
+    unmap_domain_page(pl4e);
+
+    if ( rc >= 0 )
+    {
+        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+
+    return rc;
+}
+
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      int preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    unsigned long gmfn;
+    int rc;
+
+    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
+    {
+        /* A page table is dirtied when its type count becomes zero. */
+        paging_mark_dirty(owner, page_to_mfn(page));
+
+        ASSERT(!shadow_mode_refcounts(owner));
+
+        gmfn = mfn_to_gmfn(owner, mfn_x(page_to_mfn(page)));
+        ASSERT(VALID_M2P(gmfn));
+        /* Page sharing not supported for shadowed domains */
+        if(!SHARED_M2P(gmfn))
+            shadow_remove_all_shadows(owner, _mfn(gmfn));
+    }
+
+    if ( !(type & PGT_partial) )
+    {
+        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
+        page->partial_pte = 0;
+    }
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        free_l1_table(page);
+        rc = 0;
+        break;
+    case PGT_l2_page_table:
+        rc = free_l2_table(page, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = free_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = free_l4_table(page);
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
+                 type, mfn_x(page_to_mfn(page)));
+        rc = -EINVAL;
+        BUG();
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 21/23] x86/mm: move and add pv prefix to invalidate_shadow_ldt
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (19 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-14 12:58 ` [PATCH v5 22/23] x86/mm: split out PV mm hypercalls to pv/mm-hypercalls.c Wei Liu
  2017-09-14 12:58 ` [PATCH v5 23/23] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu
  22 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

It is needed by common mm code and pv code. Move it to pv/mm.c. Export
it via asm-x86/pv/mm.h. Use bool for flush parameter while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 44 ++++----------------------------------------
 xen/arch/x86/pv/mm.c        | 35 +++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/mm.h |  4 ++++
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 3f8d22650d..26e3492597 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -487,42 +487,6 @@ static inline void page_set_tlbflush_timestamp(struct page_info *page)
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
-static void invalidate_shadow_ldt(struct vcpu *v, int flush)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    struct page_info *page;
-
-    BUG_ON(unlikely(in_irq()));
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
-        goto out;
-
-    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
-    pl1e = pv_ldt_ptes(v);
-
-    for ( i = 0; i < 16; i++ )
-    {
-        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
-            continue;
-        page = l1e_get_page(pl1e[i]);
-        l1e_write(&pl1e[i], l1e_empty());
-        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
-        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
-        put_page_and_type(page);
-    }
-
-    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
-    if ( flush )
-        flush_tlb_mask(v->vcpu_dirty_cpumask);
-
- out:
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-}
-
-
 bool is_iomem_page(mfn_t mfn)
 {
     struct page_info *page;
@@ -864,7 +828,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             for_each_vcpu ( pg_owner, v )
-                invalidate_shadow_ldt(v, 1);
+                pv_invalidate_shadow_ldt(v, 1);
         }
         put_page(page);
     }
@@ -1670,7 +1634,7 @@ int new_guest_cr3(mfn_t mfn)
             return rc;
         }
 
-        invalidate_shadow_ldt(curr, 0);
+        pv_invalidate_shadow_ldt(curr, 0);
         write_ptbase(curr);
 
         return 0;
@@ -1708,7 +1672,7 @@ int new_guest_cr3(mfn_t mfn)
         return rc;
     }
 
-    invalidate_shadow_ldt(curr, 0);
+    pv_invalidate_shadow_ldt(curr, 0);
 
     if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
         fill_ro_mpt(mfn);
@@ -2209,7 +2173,7 @@ long do_mmuext_op(
             else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
                       (curr->arch.pv_vcpu.ldt_base != ptr) )
             {
-                invalidate_shadow_ldt(curr, 0);
+                pv_invalidate_shadow_ldt(curr, 0);
                 flush_tlb_local();
                 curr->arch.pv_vcpu.ldt_base = ptr;
                 curr->arch.pv_vcpu.ldt_ents = ents;
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 47cdf58dcf..9415951a05 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -1005,6 +1005,41 @@ int pv_free_page_type(struct page_info *page, unsigned long type,
     return rc;
 }
 
+void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    struct page_info *page;
+
+    BUG_ON(unlikely(in_irq()));
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
+        goto out;
+
+    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
+    pl1e = pv_ldt_ptes(v);
+
+    for ( i = 0; i < 16; i++ )
+    {
+        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
+            continue;
+        page = l1e_get_page(pl1e[i]);
+        l1e_write(&pl1e[i], l1e_empty());
+        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
+        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
+        put_page_and_type(page);
+    }
+
+    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
+    if ( flush )
+        flush_tlb_mask(v->vcpu_dirty_cpumask);
+
+ out:
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 0cd8beec39..9c2366bf23 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -37,6 +37,8 @@ int pv_alloc_page_type(struct page_info *page, unsigned long type,
 int pv_free_page_type(struct page_info *page, unsigned long type,
                       int preemptible);
 
+void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush);
+
 #else
 
 #include <xen/errno.h>
@@ -63,6 +65,8 @@ static inline int pv_free_page_type(struct page_info *page, unsigned long type,
                                     int preemptible)
 { BUG(); return -EINVAL; }
 
+static inline void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush) {}
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 22/23] x86/mm: split out PV mm hypercalls to pv/mm-hypercalls.c
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (20 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 21/23] x86/mm: move and add pv prefix to invalidate_shadow_ldt Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  2017-09-14 12:58 ` [PATCH v5 23/23] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu
  22 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Also move new_guest_cr3 there so that we don't have to export
mod_l1_entry.

Fix coding style issues. Change v to curr, d to currd and u64 to
uint64_t where appropriate.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
I can't convince git diff to produce sensible diff for donate_page and
steal_page. Those functions are not changed.
---
 xen/arch/x86/mm.c               | 1570 ++-------------------------------------
 xen/arch/x86/pv/Makefile        |    1 +
 xen/arch/x86/pv/mm-hypercalls.c | 1461 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1535 insertions(+), 1497 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm-hypercalls.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 26e3492597..2ffcc53c6c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -885,283 +885,6 @@ void page_unlock(struct page_info *page)
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
 }
 
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST                                     \
-    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
-     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-                        unsigned long gl1mfn, int preserve_ad,
-                        struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
-    l1_pgentry_t ol1e;
-    struct domain *pt_dom = pt_vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
-        return -EFAULT;
-
-    ASSERT(!paging_mode_refcounts(pt_dom));
-
-    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
-    {
-        /* Translate foreign guest addresses. */
-        struct page_info *page = NULL;
-
-        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
-            return -EINVAL;
-        }
-
-        if ( paging_mode_translate(pg_dom) )
-        {
-            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
-            if ( !page )
-                return -EINVAL;
-            nl1e = l1e_from_page(page, l1e_get_flags(nl1e));
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            nl1e = adjust_guest_l1e(nl1e, pt_dom);
-            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                              preserve_ad);
-            if ( page )
-                put_page(page);
-            return rc ? 0 : -EBUSY;
-        }
-
-        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
-        {
-        default:
-            if ( page )
-                put_page(page);
-            return rc;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(nl1e, rc);
-            rc = 0;
-            break;
-        }
-        if ( page )
-            put_page(page);
-
-        nl1e = adjust_guest_l1e(nl1e, pt_dom);
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                    preserve_ad)) )
-        {
-            ol1e = nl1e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l1e(ol1e, pt_dom);
-    return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e,
-                        l2_pgentry_t nl2e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l2_pgentry_t ol2e;
-    struct domain *d = vcpu->domain;
-    struct page_info *l2pg = mfn_to_page(_mfn(pfn));
-    unsigned long type = l2pg->u.inuse.type_info;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl2e));
-        return -EPERM;
-    }
-
-    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
-        return -EFAULT;
-
-    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            nl2e = adjust_guest_l2e(nl2e, d);
-            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
-                return 0;
-            return -EBUSY;
-        }
-
-        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
-            return rc;
-
-        nl2e = adjust_guest_l2e(nl2e, d);
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol2e = nl2e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l2e(ol2e, pfn);
-    return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e,
-                        l3_pgentry_t nl3e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l3_pgentry_t ol3e;
-    struct domain *d = vcpu->domain;
-    int rc = 0;
-
-    /*
-     * Disallow updates to final L3 slot. It contains Xen mappings, and it
-     * would be a pain to ensure they remain continuously valid throughout.
-     */
-    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
-        return -EINVAL;
-
-    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
-        return -EFAULT;
-
-    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            nl3e = adjust_guest_l3e(nl3e, d);
-            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l3e(nl3e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        nl3e = adjust_guest_l3e(nl3e, d);
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol3e = nl3e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    if ( likely(rc == 0) )
-        if ( !create_pae_xen_mappings(d, pl3e) )
-            BUG();
-
-    put_page_from_l3e(ol3e, pfn, 0, 1);
-    return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e,
-                        l4_pgentry_t nl4e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    struct domain *d = vcpu->domain;
-    l4_pgentry_t ol4e;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl4e));
-        return -EINVAL;
-    }
-
-    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
-        return -EFAULT;
-
-    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            nl4e = adjust_guest_l4e(nl4e, d);
-            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l4e(nl4e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        nl4e = adjust_guest_l4e(nl4e, d);
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol4e = nl4e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    put_page_from_l4e(ol4e, pfn, 0, 1);
-    return rc;
-}
-
 static int cleanup_page_cacheattr(struct page_info *page)
 {
     unsigned int cacheattr =
@@ -1602,1132 +1325,101 @@ int vcpu_destroy_pagetables(struct vcpu *v)
     return rc != -EINTR ? rc : -ERESTART;
 }
 
-int new_guest_cr3(mfn_t mfn)
+int donate_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
 {
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    int rc;
-    mfn_t old_base_mfn;
-
-    if ( is_pv_32bit_domain(d) )
-    {
-        mfn_t gt_mfn = pagetable_get_mfn(curr->arch.guest_table);
-        l4_pgentry_t *pl4e = map_domain_page(gt_mfn);
-
-        rc = mod_l4_entry(pl4e,
-                          l4e_from_mfn(mfn,
-                                       (_PAGE_PRESENT | _PAGE_RW |
-                                        _PAGE_USER | _PAGE_ACCESSED)),
-                          mfn_x(gt_mfn), 0, curr);
-        unmap_domain_page(pl4e);
-        switch ( rc )
-        {
-        case 0:
-            break;
-        case -EINTR:
-        case -ERESTART:
-            return -ERESTART;
-        default:
-            gdprintk(XENLOG_WARNING,
-                     "Error while installing new compat baseptr %" PRI_mfn "\n",
-                     mfn_x(mfn));
-            return rc;
-        }
+    const struct domain *owner = dom_xen;
 
-        pv_invalidate_shadow_ldt(curr, 0);
-        write_ptbase(curr);
+    spin_lock(&d->page_alloc_lock);
 
-        return 0;
-    }
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
+        goto fail;
 
-    rc = put_old_guest_table(curr);
-    if ( unlikely(rc) )
-        return rc;
+    if ( d->is_dying )
+        goto fail;
 
-    old_base_mfn = pagetable_get_mfn(curr->arch.guest_table);
-    /*
-     * This is particularly important when getting restarted after the
-     * previous attempt got preempted in the put-old-MFN phase.
-     */
-    if ( mfn_eq(old_base_mfn, mfn) )
-    {
-        write_ptbase(curr);
-        return 0;
-    }
+    if ( page->count_info & ~(PGC_allocated | 1) )
+        goto fail;
 
-    rc = paging_mode_refcounts(d)
-         ? (get_page_from_mfn(mfn, d) ? 0 : -EINVAL)
-         : get_page_and_type_from_mfn(mfn, PGT_root_page_table, d, 0, 1);
-    switch ( rc )
+    if ( !(memflags & MEMF_no_refcount) )
     {
-    case 0:
-        break;
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    default:
-        gdprintk(XENLOG_WARNING,
-                 "Error while installing new baseptr %" PRI_mfn "\n",
-                 mfn_x(mfn));
-        return rc;
+        if ( d->tot_pages >= d->max_pages )
+            goto fail;
+        domain_adjust_tot_pages(d, 1);
     }
 
-    pv_invalidate_shadow_ldt(curr, 0);
-
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-        fill_ro_mpt(mfn);
-    curr->arch.guest_table = pagetable_from_mfn(mfn);
-    update_cr3(curr);
-
-    write_ptbase(curr);
-
-    if ( likely(mfn_x(old_base_mfn) != 0) )
-    {
-        struct page_info *page = mfn_to_page(old_base_mfn);
+    page->count_info = PGC_allocated | 1;
+    page_set_owner(page, d);
+    page_list_add_tail(page,&d->page_list);
 
-        if ( paging_mode_refcounts(d) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-                rc = -ERESTART;
-                /* fallthrough */
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-    }
+    spin_unlock(&d->page_alloc_lock);
+    return 0;
 
-    return rc;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
+             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             mfn_x(page_to_mfn(page)), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -EINVAL;
 }
 
-static struct domain *get_pg_owner(domid_t domid)
+int steal_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
 {
-    struct domain *pg_owner = NULL, *curr = current->domain;
+    unsigned long x, y;
+    bool drop_dom_ref = false;
+    const struct domain *owner = dom_xen;
 
-    if ( likely(domid == DOMID_SELF) )
-    {
-        pg_owner = rcu_lock_current_domain();
-        goto out;
-    }
+    if ( paging_mode_external(d) )
+        return -EOPNOTSUPP;
 
-    if ( unlikely(domid == curr->domain_id) )
-    {
-        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
-        goto out;
-    }
+    spin_lock(&d->page_alloc_lock);
 
-    switch ( domid )
-    {
-    case DOMID_IO:
-        pg_owner = rcu_lock_domain(dom_io);
-        break;
-    case DOMID_XEN:
-        pg_owner = rcu_lock_domain(dom_xen);
-        break;
-    default:
-        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
-        {
-            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
-            break;
-        }
-        break;
-    }
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
+        goto fail;
 
- out:
-    return pg_owner;
-}
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
 
-static void put_pg_owner(struct domain *pg_owner)
-{
-    rcu_unlock_domain(pg_owner);
-}
+    /*
+     * With the sole reference dropped temporarily, no-one can update type
+     * information. Type count also needs to be zero in this case, but e.g.
+     * PGT_seg_desc_page may still have PGT_validated set, which we need to
+     * clear before transferring ownership (as validation criteria vary
+     * depending on domain type).
+     */
+    BUG_ON(page->u.inuse.type_info & (PGT_count_mask | PGT_locked |
+                                      PGT_pinned));
+    page->u.inuse.type_info = 0;
 
-static inline int vcpumask_to_pcpumask(
-    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
-{
-    unsigned int vcpu_id, vcpu_bias, offs;
-    unsigned long vmask;
-    struct vcpu *v;
-    bool is_native = !is_pv_32bit_domain(d);
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
 
-    cpumask_clear(pmask);
-    for ( vmask = 0, offs = 0; ; ++offs )
-    {
-        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
-        if ( vcpu_bias >= d->max_vcpus )
-            return 0;
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = true;
+    page_list_del(page, &d->page_list);
 
-        if ( unlikely(is_native ?
-                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
-                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
-                                             offs, 1)) )
-        {
-            cpumask_clear(pmask);
-            return -EFAULT;
-        }
-
-        while ( vmask )
-        {
-            vcpu_id = find_first_set_bit(vmask);
-            vmask &= ~(1UL << vcpu_id);
-            vcpu_id += vcpu_bias;
-            if ( (vcpu_id >= d->max_vcpus) )
-                return 0;
-            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
-                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
-        }
-    }
-}
-
-long do_mmuext_op(
-    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmuext_op op;
-    unsigned long type;
-    unsigned int i, done = 0;
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    struct domain *pg_owner;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(uops)) )
-    {
-        /*
-         * See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below.
-         */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmuext_op);
-
-    if ( unlikely(!guest_handle_okay(uops, count)) )
-        return -EFAULT;
-
-    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
-        return -ESRCH;
-
-    if ( !is_pv_domain(pg_owner) )
-    {
-        put_pg_owner(pg_owner);
-        return -EINVAL;
-    }
-
-    rc = xsm_mmuext_op(XSM_TARGET, currd, pg_owner);
-    if ( rc )
-    {
-        put_pg_owner(pg_owner);
-        return rc;
-    }
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( is_hvm_domain(currd) )
-        {
-            switch ( op.cmd )
-            {
-            case MMUEXT_PIN_L1_TABLE:
-            case MMUEXT_PIN_L2_TABLE:
-            case MMUEXT_PIN_L3_TABLE:
-            case MMUEXT_PIN_L4_TABLE:
-            case MMUEXT_UNPIN_TABLE:
-                break;
-            default:
-                rc = -EOPNOTSUPP;
-                goto done;
-            }
-        }
-
-        rc = 0;
-
-        switch ( op.cmd )
-        {
-            struct page_info *page;
-            p2m_type_t p2mt;
-
-        case MMUEXT_PIN_L1_TABLE:
-            type = PGT_l1_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L2_TABLE:
-            type = PGT_l2_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L3_TABLE:
-            type = PGT_l3_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L4_TABLE:
-            if ( is_pv_32bit_domain(pg_owner) )
-                break;
-            type = PGT_l4_page_table;
-
-        pin_page:
-            /* Ignore pinning of invalid paging levels. */
-            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
-                break;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            rc = get_page_type_preemptible(page, type);
-            if ( unlikely(rc) )
-            {
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else if ( rc != -ERESTART )
-                    gdprintk(XENLOG_WARNING,
-                             "Error %d while pinning mfn %" PRI_mfn "\n",
-                             rc, mfn_x(page_to_mfn(page)));
-                if ( page != curr->arch.old_guest_table )
-                    put_page(page);
-                break;
-            }
-
-            rc = xsm_memory_pin_page(XSM_HOOK, currd, pg_owner, page);
-            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
-                                                  &page->u.inuse.type_info)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " already pinned\n",
-                         mfn_x(page_to_mfn(page)));
-                rc = -EINVAL;
-            }
-
-            if ( unlikely(rc) )
-                goto pin_drop;
-
-            /* A page is dirtied when its pin status is set. */
-            paging_mark_dirty(pg_owner, page_to_mfn(page));
-
-            /* We can race domain destruction (domain_relinquish_resources). */
-            if ( unlikely(pg_owner != currd) )
-            {
-                bool drop_ref;
-
-                spin_lock(&pg_owner->page_alloc_lock);
-                drop_ref = (pg_owner->is_dying &&
-                            test_and_clear_bit(_PGT_pinned,
-                                               &page->u.inuse.type_info));
-                spin_unlock(&pg_owner->page_alloc_lock);
-                if ( drop_ref )
-                {
-        pin_drop:
-                    if ( type == PGT_l1_page_table )
-                        put_page_and_type(page);
-                    else
-                        curr->arch.old_guest_table = page;
-                }
-            }
-            break;
-
-        case MMUEXT_UNPIN_TABLE:
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
-                         op.arg1.mfn, pg_owner->domain_id);
-                rc = -EINVAL;
-                break;
-            }
-
-            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            {
-                put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                rc = 0;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-            put_page(page);
-
-            /* A page is dirtied when its pin status is cleared. */
-            paging_mark_dirty(pg_owner, page_to_mfn(page));
-            break;
-
-        case MMUEXT_NEW_BASEPTR:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(currd)) )
-                rc = -EINVAL;
-            else
-                rc = new_guest_cr3(_mfn(op.arg1.mfn));
-            break;
-
-        case MMUEXT_NEW_USER_BASEPTR: {
-            unsigned long old_mfn;
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(currd)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
-            /*
-             * This is particularly important when getting restarted after the
-             * previous attempt got preempted in the put-old-MFN phase.
-             */
-            if ( old_mfn == op.arg1.mfn )
-                break;
-
-            if ( op.arg1.mfn != 0 )
-            {
-                rc = get_page_and_type_from_mfn(
-                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, 1);
-
-                if ( unlikely(rc) )
-                {
-                    if ( rc == -EINTR )
-                        rc = -ERESTART;
-                    else if ( rc != -ERESTART )
-                        gdprintk(XENLOG_WARNING,
-                                 "Error %d installing new mfn %" PRI_mfn "\n",
-                                 rc, op.arg1.mfn);
-                    break;
-                }
-
-                if ( VM_ASSIST(currd, m2p_strict) )
-                    zap_ro_mpt(_mfn(op.arg1.mfn));
-            }
-
-            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
-
-            if ( old_mfn != 0 )
-            {
-                page = mfn_to_page(_mfn(old_mfn));
-
-                switch ( rc = put_page_and_type_preemptible(page) )
-                {
-                case -EINTR:
-                    rc = -ERESTART;
-                    /* fallthrough */
-                case -ERESTART:
-                    curr->arch.old_guest_table = page;
-                    break;
-                default:
-                    BUG_ON(rc);
-                    break;
-                }
-            }
-
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_LOCAL:
-            if ( likely(currd == pg_owner) )
-                flush_tlb_local();
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_LOCAL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else
-                paging_invlpg(curr, op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_TLB_FLUSH_MULTI:
-        case MMUEXT_INVLPG_MULTI:
-        {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(vcpumask_to_pcpumask(currd,
-                                   guest_handle_to_param(op.arg2.vcpumask,
-                                                         const_void),
-                                   mask)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(mask);
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(mask, op.arg1.linear_addr);
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_ALL:
-            if ( likely(currd == pg_owner) )
-                flush_tlb_mask(currd->domain_dirty_cpumask);
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_ALL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(currd->domain_dirty_cpumask,
-                                   op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_FLUSH_CACHE:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(!cache_flush_permitted(currd)) )
-                rc = -EACCES;
-            else
-                wbinvd();
-            break;
-
-        case MMUEXT_FLUSH_CACHE_GLOBAL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( likely(cache_flush_permitted(currd)) )
-            {
-                unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                cpumask_clear(mask);
-                for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(mask,
-                                             per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, mask);
-                flush_mask(mask, FLUSH_CACHE);
-            }
-            else
-                rc = -EINVAL;
-            break;
-
-        case MMUEXT_SET_LDT:
-        {
-            unsigned int ents = op.arg2.nr_ents;
-            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( paging_mode_external(currd) )
-                rc = -EINVAL;
-            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
-                      (ents > 8192) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
-                rc = -EINVAL;
-            }
-            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
-                      (curr->arch.pv_vcpu.ldt_base != ptr) )
-            {
-                pv_invalidate_shadow_ldt(curr, 0);
-                flush_tlb_local();
-                curr->arch.pv_vcpu.ldt_base = ptr;
-                curr->arch.pv_vcpu.ldt_ents = ents;
-                load_LDT(curr);
-            }
-            break;
-        }
-
-        case MMUEXT_CLEAR_PAGE:
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt, P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && page )
-            {
-                put_page(page);
-                page = NULL;
-            }
-            if ( !page || !get_page_type(page, PGT_writable_page) )
-            {
-                if ( page )
-                    put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            /* A page is dirtied when it's being cleared. */
-            paging_mark_dirty(pg_owner, page_to_mfn(page));
-
-            clear_domain_page(page_to_mfn(page));
-
-            put_page_and_type(page);
-            break;
-
-        case MMUEXT_COPY_PAGE:
-        {
-            struct page_info *src_page, *dst_page;
-
-            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, &p2mt,
-                                         P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && src_page )
-            {
-                put_page(src_page);
-                src_page = NULL;
-            }
-            if ( unlikely(!src_page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Error copying from mfn %" PRI_mfn "\n",
-                         op.arg2.src_mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt,
-                                         P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && dst_page )
-            {
-                put_page(dst_page);
-                dst_page = NULL;
-            }
-            rc = (dst_page &&
-                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
-            if ( unlikely(rc) )
-            {
-                put_page(src_page);
-                if ( dst_page )
-                    put_page(dst_page);
-                gdprintk(XENLOG_WARNING,
-                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
-                break;
-            }
-
-            /* A page is dirtied when it's being copied to. */
-            paging_mark_dirty(pg_owner, page_to_mfn(dst_page));
-
-            copy_domain_page(page_to_mfn(dst_page), page_to_mfn(src_page));
-
-            put_page_and_type(dst_page);
-            put_page(src_page);
-            break;
-        }
-
-        case MMUEXT_MARK_SUPER:
-        case MMUEXT_UNMARK_SUPER:
-            rc = -EOPNOTSUPP;
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
- done:
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(uops, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmuext_op, "hihi",
-            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmuext_op, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    perfc_add(num_mmuext_ops, i);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-long do_mmu_update(
-    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmu_update req;
-    void *va = NULL;
-    unsigned long gpfn, gmfn, mfn;
-    struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
-    struct vcpu *curr = current, *v = curr;
-    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
-    mfn_t map_mfn = INVALID_MFN;
-    uint32_t xsm_needed = 0;
-    uint32_t xsm_checked = 0;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(ureqs)) )
-    {
-        /*
-         * See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below.
-         */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmu_update);
-
-    if ( unlikely(!guest_handle_okay(ureqs, count)) )
-        return -EFAULT;
-
-    if ( (pt_dom = foreigndom >> 16) != 0 )
-    {
-        /* Pagetables belong to a foreign domain (PFD). */
-        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -ESRCH;
-
-        if ( pt_owner == d )
-            rcu_unlock_domain(pt_owner);
-        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
-        {
-            rc = -EINVAL;
-            goto out;
-        }
-    }
-
-    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
-    {
-        rc = -ESRCH;
-        goto out;
-    }
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
-        switch ( cmd )
-        {
-            /*
-             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
-             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
-             * current A/D bits.
-             */
-        case MMU_NORMAL_PT_UPDATE:
-        case MMU_PT_UPDATE_PRESERVE_AD:
-        {
-            p2m_type_t p2mt;
-
-            rc = -EOPNOTSUPP;
-            if ( unlikely(paging_mode_refcounts(pt_owner)) )
-                break;
-
-            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
-            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
-            {
-                xsm_needed |= XSM_MMU_UPDATE_READ;
-                if ( get_pte_flags(req.val) & _PAGE_RW )
-                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
-            }
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-            rc = -EINVAL;
-
-            req.ptr -= cmd;
-            gmfn = req.ptr >> PAGE_SHIFT;
-            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
-            if ( p2m_is_paged(p2mt) )
-            {
-                ASSERT(!page);
-                p2m_mem_paging_populate(pg_owner, gmfn);
-                rc = -ENOENT;
-                break;
-            }
-
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for normal update\n");
-                break;
-            }
-
-            mfn = mfn_x(page_to_mfn(page));
-
-            if ( !mfn_eq(_mfn(mfn), map_mfn) )
-            {
-                if ( va )
-                    unmap_domain_page(va);
-                va = map_domain_page(_mfn(mfn));
-                map_mfn = _mfn(mfn);
-            }
-            va = _p(((unsigned long)va & PAGE_MASK) + (req.ptr & ~PAGE_MASK));
-
-            if ( page_lock(page) )
-            {
-                switch ( page->u.inuse.type_info & PGT_type_mask )
-                {
-                case PGT_l1_page_table:
-                {
-                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    p2m_type_t l1e_p2mt = p2m_ram_rw;
-                    struct page_info *target = NULL;
-                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
-                                        P2M_UNSHARE : P2M_ALLOC;
-
-                    if ( paging_mode_translate(pg_owner) )
-                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
-                                                   &l1e_p2mt, q);
-
-                    if ( p2m_is_paged(l1e_p2mt) )
-                    {
-                        if ( target )
-                            put_page(target);
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
-                    {
-                        rc = -ENOENT;
-                        break;
-                    }
-                    /* If we tried to unshare and failed */
-                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
-                    {
-                        /* We could not have obtained a page ref. */
-                        ASSERT(target == NULL);
-                        /* And mem_sharing_notify has already been called. */
-                        rc = -ENOMEM;
-                        break;
-                    }
-
-                    rc = mod_l1_entry(va, l1e, mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-                                      pg_owner);
-                    if ( target )
-                        put_page(target);
-                }
-                break;
-                case PGT_l2_page_table:
-                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l3_page_table:
-                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l4_page_table:
-                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                break;
-                case PGT_writable_page:
-                    perfc_incr(writable_mmu_updates);
-                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                        rc = 0;
-                    break;
-                }
-                page_unlock(page);
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-            }
-            else if ( get_page_type(page, PGT_writable_page) )
-            {
-                perfc_incr(writable_mmu_updates);
-                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                    rc = 0;
-                put_page_type(page);
-            }
-
-            put_page(page);
-        }
-        break;
-
-        case MMU_MACHPHYS_UPDATE:
-            if ( unlikely(d != pt_owner) )
-            {
-                rc = -EPERM;
-                break;
-            }
-
-            if ( unlikely(paging_mode_translate(pg_owner)) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            mfn = req.ptr >> PAGE_SHIFT;
-            gpfn = req.val;
-
-            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-
-            if ( unlikely(!get_page_from_mfn(_mfn(mfn), pg_owner)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for mach->phys update\n");
-                rc = -EINVAL;
-                break;
-            }
-
-            set_gpfn_from_mfn(mfn, gpfn);
-
-            paging_mark_dirty(pg_owner, _mfn(mfn));
-
-            put_page(mfn_to_page(_mfn(mfn)));
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(ureqs, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmu_update, "hihi",
-            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmu_update, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    if ( va )
-        unmap_domain_page(va);
-
-    perfc_add(num_page_updates, i);
-
- out:
-    if ( pt_owner != d )
-        rcu_unlock_domain(pt_owner);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
-        goto fail;
-
-    if ( d->is_dying )
-        goto fail;
-
-    if ( page->count_info & ~(PGC_allocated | 1) )
-        goto fail;
-
-    if ( !(memflags & MEMF_no_refcount) )
-    {
-        if ( d->tot_pages >= d->max_pages )
-            goto fail;
-        domain_adjust_tot_pages(d, 1);
-    }
-
-    page->count_info = PGC_allocated | 1;
-    page_set_owner(page, d);
-    page_list_add_tail(page,&d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
-             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             mfn_x(page_to_mfn(page)), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -EINVAL;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    unsigned long x, y;
-    bool drop_dom_ref = false;
-    const struct domain *owner = dom_xen;
-
-    if ( paging_mode_external(d) )
-        return -EOPNOTSUPP;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
-        goto fail;
-
-    /*
-     * We require there is just one reference (PGC_allocated). We temporarily
-     * drop this reference now so that we can safely swizzle the owner.
-     */
-    y = page->count_info;
-    do {
-        x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
-            goto fail;
-        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
-    } while ( y != x );
-
-    /*
-     * With the sole reference dropped temporarily, no-one can update type
-     * information. Type count also needs to be zero in this case, but e.g.
-     * PGT_seg_desc_page may still have PGT_validated set, which we need to
-     * clear before transferring ownership (as validation criteria vary
-     * depending on domain type).
-     */
-    BUG_ON(page->u.inuse.type_info & (PGT_count_mask | PGT_locked |
-                                      PGT_pinned));
-    page->u.inuse.type_info = 0;
-
-    /* Swizzle the owner then reinstate the PGC_allocated reference. */
-    page_set_owner(page, NULL);
-    y = page->count_info;
-    do {
-        x = y;
-        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
-
-    /* Unlink from original owner. */
-    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
-        drop_dom_ref = true;
-    page_list_del(page, &d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    if ( unlikely(drop_dom_ref) )
-        put_domain(d);
-    return 0;
+    spin_unlock(&d->page_alloc_lock);
+    if ( unlikely(drop_dom_ref) )
+        put_domain(d);
+    return 0;
 
  fail:
     spin_unlock(&d->page_alloc_lock);
@@ -2739,122 +1431,6 @@ int steal_page(
     return -EINVAL;
 }
 
-static int __do_update_va_mapping(
-    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
-    l1_pgentry_t   val = l1e_from_intpte(val64);
-    struct vcpu   *v   = current;
-    struct domain *d   = v->domain;
-    struct page_info *gl1pg;
-    l1_pgentry_t  *pl1e;
-    unsigned long  bmap_ptr;
-    mfn_t          gl1mfn;
-    cpumask_t     *mask = NULL;
-    int            rc;
-
-    perfc_incr(calls_to_update_va);
-
-    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
-    if ( rc )
-        return rc;
-
-    rc = -EINVAL;
-    pl1e = map_guest_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_mfn(gl1mfn, d)) )
-        goto out;
-
-    gl1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(gl1pg) )
-    {
-        put_page(gl1pg);
-        goto out;
-    }
-
-    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(gl1pg);
-        put_page(gl1pg);
-        goto out;
-    }
-
-    rc = mod_l1_entry(pl1e, val, mfn_x(gl1mfn), 0, v, pg_owner);
-
-    page_unlock(gl1pg);
-    put_page(gl1pg);
-
- out:
-    if ( pl1e )
-        unmap_domain_page(pl1e);
-
-    switch ( flags & UVMF_FLUSHTYPE_MASK )
-    {
-    case UVMF_TLB_FLUSH:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            flush_tlb_local();
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_mask(mask);
-        break;
-
-    case UVMF_INVLPG:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            paging_invlpg(v, va);
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_one_mask(mask, va);
-        break;
-    }
-
-    return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
-                          unsigned long flags)
-{
-    return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
-                                      unsigned long flags,
-                                      domid_t domid)
-{
-    struct domain *pg_owner;
-    int rc;
-
-    if ( (pg_owner = get_pg_owner(domid)) == NULL )
-        return -ESRCH;
-
-    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
-    put_pg_owner(pg_owner);
-
-    return rc;
-}
-
 typedef struct e820entry e820entry_t;
 DEFINE_XEN_GUEST_HANDLE(e820entry_t);
 
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index bac2792aa2..17e058db7e 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -10,6 +10,7 @@ obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
 obj-y += mm.o
+obj-y += mm-hypercalls.o
 obj-y += ro-page-fault.o
 obj-y += traps.o
 
diff --git a/xen/arch/x86/pv/mm-hypercalls.c b/xen/arch/x86/pv/mm-hypercalls.c
new file mode 100644
index 0000000000..29a609391c
--- /dev/null
+++ b/xen/arch/x86/pv/mm-hypercalls.c
@@ -0,0 +1,1461 @@
+/******************************************************************************
+ * arch/x86/pv/mm-hypercalls.c
+ *
+ * Memory management hypercalls for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+
+#include <asm/hypercall.h>
+#include <asm/iocap.h>
+#include <asm/ldt.h>
+#include <asm/mm.h>
+#include <asm/p2m.h>
+#include <asm/pv/mm.h>
+#include <asm/setup.h>
+
+#include <xsm/xsm.h>
+
+#include "mm.h"
+
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef mfn_to_page
+#define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
+#undef page_to_mfn
+#define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
+
+static struct domain *get_pg_owner(domid_t domid)
+{
+    struct domain *pg_owner = NULL, *curr = current->domain;
+
+    if ( likely(domid == DOMID_SELF) )
+    {
+        pg_owner = rcu_lock_current_domain();
+        goto out;
+    }
+
+    if ( unlikely(domid == curr->domain_id) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
+        goto out;
+    }
+
+    switch ( domid )
+    {
+    case DOMID_IO:
+        pg_owner = rcu_lock_domain(dom_io);
+        break;
+    case DOMID_XEN:
+        pg_owner = rcu_lock_domain(dom_xen);
+        break;
+    default:
+        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
+        {
+            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
+            break;
+        }
+        break;
+    }
+
+ out:
+    return pg_owner;
+}
+
+static void put_pg_owner(struct domain *pg_owner)
+{
+    rcu_unlock_domain(pg_owner);
+}
+
+static inline int vcpumask_to_pcpumask(
+    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
+{
+    unsigned int vcpu_id, vcpu_bias, offs;
+    unsigned long vmask;
+    struct vcpu *v;
+    bool is_native = !is_pv_32bit_domain(d);
+
+    cpumask_clear(pmask);
+    for ( vmask = 0, offs = 0; ; ++offs )
+    {
+        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
+        if ( vcpu_bias >= d->max_vcpus )
+            return 0;
+
+        if ( unlikely(is_native ?
+                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
+                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
+                                             offs, 1)) )
+        {
+            cpumask_clear(pmask);
+            return -EFAULT;
+        }
+
+        while ( vmask )
+        {
+            vcpu_id = find_first_set_bit(vmask);
+            vmask &= ~(1UL << vcpu_id);
+            vcpu_id += vcpu_bias;
+            if ( (vcpu_id >= d->max_vcpus) )
+                return 0;
+            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
+                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
+        }
+    }
+}
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST                                     \
+    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+                        unsigned long gl1mfn, int preserve_ad,
+                        struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+    l1_pgentry_t ol1e;
+    struct domain *pt_dom = pt_vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+        return -EFAULT;
+
+    ASSERT(!paging_mode_refcounts(pt_dom));
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        struct page_info *page = NULL;
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+            return -EINVAL;
+        }
+
+        if ( paging_mode_translate(pg_dom) )
+        {
+            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
+            if ( !page )
+                return -EINVAL;
+            nl1e = l1e_from_page(page, l1e_get_flags(nl1e));
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            nl1e = adjust_guest_l1e(nl1e, pt_dom);
+            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                              preserve_ad);
+            if ( page )
+                put_page(page);
+            return rc ? 0 : -EBUSY;
+        }
+
+        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        default:
+            if ( page )
+                put_page(page);
+            return rc;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
+            rc = 0;
+            break;
+        }
+        if ( page )
+            put_page(page);
+
+        nl1e = adjust_guest_l1e(nl1e, pt_dom);
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                    preserve_ad)) )
+        {
+            ol1e = nl1e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l1e(ol1e, pt_dom);
+    return rc;
+}
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e, l2_pgentry_t nl2e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    l2_pgentry_t ol2e;
+    struct domain *d = vcpu->domain;
+    struct page_info *l2pg = mfn_to_page(_mfn(pfn));
+    unsigned long type = l2pg->u.inuse.type_info;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl2e));
+        return -EPERM;
+    }
+
+    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+        return -EFAULT;
+
+    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            nl2e = adjust_guest_l2e(nl2e, d);
+            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+                return 0;
+            return -EBUSY;
+        }
+
+        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+            return rc;
+
+        nl2e = adjust_guest_l2e(nl2e, d);
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol2e = nl2e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l2e(ol2e, pfn);
+    return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e, l3_pgentry_t nl3e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    l3_pgentry_t ol3e;
+    struct domain *d = vcpu->domain;
+    int rc = 0;
+
+    /*
+     * Disallow updates to final L3 slot. It contains Xen mappings, and it
+     * would be a pain to ensure they remain continuously valid throughout.
+     */
+    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+        return -EINVAL;
+
+    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+        return -EFAULT;
+
+    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            nl3e = adjust_guest_l3e(nl3e, d);
+            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l3e(nl3e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        nl3e = adjust_guest_l3e(nl3e, d);
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol3e = nl3e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    if ( likely(rc == 0) )
+        if ( !create_pae_xen_mappings(d, pl3e) )
+            BUG();
+
+    put_page_from_l3e(ol3e, pfn, 0, 1);
+    return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e, l4_pgentry_t nl4e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    struct domain *d = vcpu->domain;
+    l4_pgentry_t ol4e;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl4e));
+        return -EINVAL;
+    }
+
+    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+        return -EFAULT;
+
+    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            nl4e = adjust_guest_l4e(nl4e, d);
+            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l4e(nl4e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        nl4e = adjust_guest_l4e(nl4e, d);
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol4e = nl4e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    put_page_from_l4e(ol4e, pfn, 0, 1);
+    return rc;
+}
+
+int new_guest_cr3(mfn_t mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    int rc;
+    mfn_t old_base_mfn;
+
+    if ( is_pv_32bit_domain(currd) )
+    {
+        mfn_t gt_mfn = pagetable_get_mfn(curr->arch.guest_table);
+        l4_pgentry_t *pl4e = map_domain_page(gt_mfn);
+
+        rc = mod_l4_entry(pl4e,
+                          l4e_from_mfn(mfn,
+                                       (_PAGE_PRESENT | _PAGE_RW |
+                                        _PAGE_USER | _PAGE_ACCESSED)),
+                          mfn_x(gt_mfn), 0, curr);
+        unmap_domain_page(pl4e);
+        switch ( rc )
+        {
+        case 0:
+            break;
+        case -EINTR:
+        case -ERESTART:
+            return -ERESTART;
+        default:
+            gdprintk(XENLOG_WARNING,
+                     "Error while installing new compat baseptr %" PRI_mfn "\n",
+                     mfn_x(mfn));
+            return rc;
+        }
+
+        pv_invalidate_shadow_ldt(curr, 0);
+        write_ptbase(curr);
+
+        return 0;
+    }
+
+    rc = put_old_guest_table(curr);
+    if ( unlikely(rc) )
+        return rc;
+
+    old_base_mfn = pagetable_get_mfn(curr->arch.guest_table);
+    /*
+     * This is particularly important when getting restarted after the
+     * previous attempt got preempted in the put-old-MFN phase.
+     */
+    if ( mfn_eq(old_base_mfn, mfn) )
+    {
+        write_ptbase(curr);
+        return 0;
+    }
+
+    rc = paging_mode_refcounts(currd)
+         ? (get_page_from_mfn(mfn, currd) ? 0 : -EINVAL)
+         : get_page_and_type_from_mfn(mfn, PGT_root_page_table, currd, 0, 1);
+    switch ( rc )
+    {
+    case 0:
+        break;
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    default:
+        gdprintk(XENLOG_WARNING,
+                 "Error while installing new baseptr %" PRI_mfn "\n",
+                 mfn_x(mfn));
+        return rc;
+    }
+
+    pv_invalidate_shadow_ldt(curr, 0);
+
+    if ( !VM_ASSIST(currd, m2p_strict) && !paging_mode_refcounts(currd) )
+        fill_ro_mpt(mfn);
+    curr->arch.guest_table = pagetable_from_mfn(mfn);
+    update_cr3(curr);
+
+    write_ptbase(curr);
+
+    if ( likely(mfn_x(old_base_mfn) != 0) )
+    {
+        struct page_info *page = mfn_to_page(old_base_mfn);
+
+        if ( paging_mode_refcounts(currd) )
+            put_page(page);
+        else
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+                rc = -ERESTART;
+                /* fallthrough */
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+    }
+
+    return rc;
+}
+
+long do_mmuext_op(XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops, unsigned int count,
+                  XEN_GUEST_HANDLE_PARAM(uint) pdone, unsigned int foreigndom)
+{
+    struct mmuext_op op;
+    unsigned long type;
+    unsigned int i, done = 0;
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct domain *pg_owner;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(uops)) )
+    {
+        /*
+         * See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below.
+         */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmuext_op);
+
+    if ( unlikely(!guest_handle_okay(uops, count)) )
+        return -EFAULT;
+
+    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
+        return -ESRCH;
+
+    if ( !is_pv_domain(pg_owner) )
+    {
+        put_pg_owner(pg_owner);
+        return -EINVAL;
+    }
+
+    rc = xsm_mmuext_op(XSM_TARGET, currd, pg_owner);
+    if ( rc )
+    {
+        put_pg_owner(pg_owner);
+        return rc;
+    }
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( is_hvm_domain(currd) )
+        {
+            switch ( op.cmd )
+            {
+            case MMUEXT_PIN_L1_TABLE:
+            case MMUEXT_PIN_L2_TABLE:
+            case MMUEXT_PIN_L3_TABLE:
+            case MMUEXT_PIN_L4_TABLE:
+            case MMUEXT_UNPIN_TABLE:
+                break;
+            default:
+                rc = -EOPNOTSUPP;
+                goto done;
+            }
+        }
+
+        rc = 0;
+
+        switch ( op.cmd )
+        {
+            struct page_info *page;
+            p2m_type_t p2mt;
+
+        case MMUEXT_PIN_L1_TABLE:
+            type = PGT_l1_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L2_TABLE:
+            type = PGT_l2_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L3_TABLE:
+            type = PGT_l3_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L4_TABLE:
+            if ( is_pv_32bit_domain(pg_owner) )
+                break;
+            type = PGT_l4_page_table;
+
+        pin_page:
+            /* Ignore pinning of invalid paging levels. */
+            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
+                break;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            rc = get_page_type_preemptible(page, type);
+            if ( unlikely(rc) )
+            {
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else if ( rc != -ERESTART )
+                    gdprintk(XENLOG_WARNING,
+                             "Error %d while pinning mfn %" PRI_mfn "\n",
+                             rc, mfn_x(page_to_mfn(page)));
+                if ( page != curr->arch.old_guest_table )
+                    put_page(page);
+                break;
+            }
+
+            rc = xsm_memory_pin_page(XSM_HOOK, currd, pg_owner, page);
+            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
+                                                  &page->u.inuse.type_info)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " already pinned\n",
+                         mfn_x(page_to_mfn(page)));
+                rc = -EINVAL;
+            }
+
+            if ( unlikely(rc) )
+                goto pin_drop;
+
+            /* A page is dirtied when its pin status is set. */
+            paging_mark_dirty(pg_owner, page_to_mfn(page));
+
+            /* We can race domain destruction (domain_relinquish_resources). */
+            if ( unlikely(pg_owner != currd) )
+            {
+                bool drop_ref;
+
+                spin_lock(&pg_owner->page_alloc_lock);
+                drop_ref = (pg_owner->is_dying &&
+                            test_and_clear_bit(_PGT_pinned,
+                                               &page->u.inuse.type_info));
+                spin_unlock(&pg_owner->page_alloc_lock);
+                if ( drop_ref )
+                {
+        pin_drop:
+                    if ( type == PGT_l1_page_table )
+                        put_page_and_type(page);
+                    else
+                        curr->arch.old_guest_table = page;
+                }
+            }
+            break;
+
+        case MMUEXT_UNPIN_TABLE:
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
+                         op.arg1.mfn, pg_owner->domain_id);
+                rc = -EINVAL;
+                break;
+            }
+
+            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
+            {
+                put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                rc = 0;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+            put_page(page);
+
+            /* A page is dirtied when its pin status is cleared. */
+            paging_mark_dirty(pg_owner, page_to_mfn(page));
+            break;
+
+        case MMUEXT_NEW_BASEPTR:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(currd)) )
+                rc = -EINVAL;
+            else
+                rc = new_guest_cr3(_mfn(op.arg1.mfn));
+            break;
+
+        case MMUEXT_NEW_USER_BASEPTR: {
+            unsigned long old_mfn;
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(currd)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
+            /*
+             * This is particularly important when getting restarted after the
+             * previous attempt got preempted in the put-old-MFN phase.
+             */
+            if ( old_mfn == op.arg1.mfn )
+                break;
+
+            if ( op.arg1.mfn != 0 )
+            {
+                rc = get_page_and_type_from_mfn(
+                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, 1);
+
+                if ( unlikely(rc) )
+                {
+                    if ( rc == -EINTR )
+                        rc = -ERESTART;
+                    else if ( rc != -ERESTART )
+                        gdprintk(XENLOG_WARNING,
+                                 "Error %d installing new mfn %" PRI_mfn "\n",
+                                 rc, op.arg1.mfn);
+                    break;
+                }
+
+                if ( VM_ASSIST(currd, m2p_strict) )
+                    zap_ro_mpt(_mfn(op.arg1.mfn));
+            }
+
+            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
+
+            if ( old_mfn != 0 )
+            {
+                page = mfn_to_page(_mfn(old_mfn));
+
+                switch ( rc = put_page_and_type_preemptible(page) )
+                {
+                case -EINTR:
+                    rc = -ERESTART;
+                    /* fallthrough */
+                case -ERESTART:
+                    curr->arch.old_guest_table = page;
+                    break;
+                default:
+                    BUG_ON(rc);
+                    break;
+                }
+            }
+
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_LOCAL:
+            if ( likely(currd == pg_owner) )
+                flush_tlb_local();
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_LOCAL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else
+                paging_invlpg(curr, op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_TLB_FLUSH_MULTI:
+        case MMUEXT_INVLPG_MULTI:
+        {
+            cpumask_t *mask = this_cpu(scratch_cpumask);
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(vcpumask_to_pcpumask(currd,
+                                   guest_handle_to_param(op.arg2.vcpumask,
+                                                         const_void),
+                                   mask)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
+                flush_tlb_mask(mask);
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_ALL:
+            if ( likely(currd == pg_owner) )
+                flush_tlb_mask(currd->domain_dirty_cpumask);
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_ALL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(currd->domain_dirty_cpumask,
+                                   op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_FLUSH_CACHE:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(!cache_flush_permitted(currd)) )
+                rc = -EACCES;
+            else
+                wbinvd();
+            break;
+
+        case MMUEXT_FLUSH_CACHE_GLOBAL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( likely(cache_flush_permitted(currd)) )
+            {
+                unsigned int cpu;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
+
+                cpumask_clear(mask);
+                for_each_online_cpu(cpu)
+                    if ( !cpumask_intersects(mask,
+                                             per_cpu(cpu_sibling_mask, cpu)) )
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
+            }
+            else
+                rc = -EINVAL;
+            break;
+
+        case MMUEXT_SET_LDT:
+        {
+            unsigned int ents = op.arg2.nr_ents;
+            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( paging_mode_external(currd) )
+                rc = -EINVAL;
+            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
+                      (ents > 8192) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
+                rc = -EINVAL;
+            }
+            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
+                      (curr->arch.pv_vcpu.ldt_base != ptr) )
+            {
+                pv_invalidate_shadow_ldt(curr, 0);
+                flush_tlb_local();
+                curr->arch.pv_vcpu.ldt_base = ptr;
+                curr->arch.pv_vcpu.ldt_ents = ents;
+                load_LDT(curr);
+            }
+            break;
+        }
+
+        case MMUEXT_CLEAR_PAGE:
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt, P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && page )
+            {
+                put_page(page);
+                page = NULL;
+            }
+            if ( !page || !get_page_type(page, PGT_writable_page) )
+            {
+                if ( page )
+                    put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            /* A page is dirtied when it's being cleared. */
+            paging_mark_dirty(pg_owner, page_to_mfn(page));
+
+            clear_domain_page(page_to_mfn(page));
+
+            put_page_and_type(page);
+            break;
+
+        case MMUEXT_COPY_PAGE:
+        {
+            struct page_info *src_page, *dst_page;
+
+            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, &p2mt,
+                                         P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && src_page )
+            {
+                put_page(src_page);
+                src_page = NULL;
+            }
+            if ( unlikely(!src_page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Error copying from mfn %" PRI_mfn "\n",
+                         op.arg2.src_mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt,
+                                         P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && dst_page )
+            {
+                put_page(dst_page);
+                dst_page = NULL;
+            }
+            rc = (dst_page &&
+                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
+            if ( unlikely(rc) )
+            {
+                put_page(src_page);
+                if ( dst_page )
+                    put_page(dst_page);
+                gdprintk(XENLOG_WARNING,
+                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
+                break;
+            }
+
+            /* A page is dirtied when it's being copied to. */
+            paging_mark_dirty(pg_owner, page_to_mfn(dst_page));
+
+            copy_domain_page(page_to_mfn(dst_page), page_to_mfn(src_page));
+
+            put_page_and_type(dst_page);
+            put_page(src_page);
+            break;
+        }
+
+        case MMUEXT_MARK_SUPER:
+        case MMUEXT_UNMARK_SUPER:
+            rc = -EOPNOTSUPP;
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+ done:
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(uops, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmuext_op, "hihi",
+            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmuext_op, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    perfc_add(num_mmuext_ops, i);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+long do_mmu_update(XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+                   unsigned int count, XEN_GUEST_HANDLE_PARAM(uint) pdone,
+                   unsigned int foreigndom)
+{
+    struct mmu_update req;
+    void *va = NULL;
+    unsigned long gpfn, gmfn, mfn;
+    struct page_info *page;
+    unsigned int cmd, i = 0, done = 0, pt_dom;
+    struct vcpu *curr = current, *v = curr;
+    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+    mfn_t map_mfn = INVALID_MFN;
+    uint32_t xsm_needed = 0;
+    uint32_t xsm_checked = 0;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(ureqs)) )
+    {
+        /*
+         * See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below.
+         */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmu_update);
+
+    if ( unlikely(!guest_handle_okay(ureqs, count)) )
+        return -EFAULT;
+
+    if ( (pt_dom = foreigndom >> 16) != 0 )
+    {
+        /* Pagetables belong to a foreign domain (PFD). */
+        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+            return -ESRCH;
+
+        if ( pt_owner == d )
+            rcu_unlock_domain(pt_owner);
+        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+    }
+
+    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
+    {
+        rc = -ESRCH;
+        goto out;
+    }
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+        switch ( cmd )
+        {
+            /*
+             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
+             */
+        case MMU_NORMAL_PT_UPDATE:
+        case MMU_PT_UPDATE_PRESERVE_AD:
+        {
+            p2m_type_t p2mt;
+
+            rc = -EOPNOTSUPP;
+            if ( unlikely(paging_mode_refcounts(pt_owner)) )
+                break;
+
+            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+            {
+                xsm_needed |= XSM_MMU_UPDATE_READ;
+                if ( get_pte_flags(req.val) & _PAGE_RW )
+                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
+            }
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+            rc = -EINVAL;
+
+            req.ptr -= cmd;
+            gmfn = req.ptr >> PAGE_SHIFT;
+            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+            if ( p2m_is_paged(p2mt) )
+            {
+                ASSERT(!page);
+                p2m_mem_paging_populate(pg_owner, gmfn);
+                rc = -ENOENT;
+                break;
+            }
+
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for normal update\n");
+                break;
+            }
+
+            mfn = mfn_x(page_to_mfn(page));
+
+            if ( !mfn_eq(_mfn(mfn), map_mfn) )
+            {
+                if ( va )
+                    unmap_domain_page(va);
+                va = map_domain_page(_mfn(mfn));
+                map_mfn = _mfn(mfn);
+            }
+            va = _p(((unsigned long)va & PAGE_MASK) + (req.ptr & ~PAGE_MASK));
+
+            if ( page_lock(page) )
+            {
+                switch ( page->u.inuse.type_info & PGT_type_mask )
+                {
+                case PGT_l1_page_table:
+                {
+                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
+                    p2m_type_t l1e_p2mt = p2m_ram_rw;
+                    struct page_info *target = NULL;
+                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+                                        P2M_UNSHARE : P2M_ALLOC;
+
+                    if ( paging_mode_translate(pg_owner) )
+                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+                                                   &l1e_p2mt, q);
+
+                    if ( p2m_is_paged(l1e_p2mt) )
+                    {
+                        if ( target )
+                            put_page(target);
+                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+                        rc = -ENOENT;
+                        break;
+                    }
+                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+                    {
+                        rc = -ENOENT;
+                        break;
+                    }
+                    /* If we tried to unshare and failed */
+                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+                    {
+                        /* We could not have obtained a page ref. */
+                        ASSERT(target == NULL);
+                        /* And mem_sharing_notify has already been called. */
+                        rc = -ENOMEM;
+                        break;
+                    }
+
+                    rc = mod_l1_entry(va, l1e, mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+                                      pg_owner);
+                    if ( target )
+                        put_page(target);
+                }
+                break;
+                case PGT_l2_page_table:
+                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l3_page_table:
+                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l4_page_table:
+                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                break;
+                case PGT_writable_page:
+                    perfc_incr(writable_mmu_updates);
+                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                        rc = 0;
+                    break;
+                }
+                page_unlock(page);
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+            }
+            else if ( get_page_type(page, PGT_writable_page) )
+            {
+                perfc_incr(writable_mmu_updates);
+                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                    rc = 0;
+                put_page_type(page);
+            }
+
+            put_page(page);
+        }
+        break;
+
+        case MMU_MACHPHYS_UPDATE:
+            if ( unlikely(d != pt_owner) )
+            {
+                rc = -EPERM;
+                break;
+            }
+
+            if ( unlikely(paging_mode_translate(pg_owner)) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            mfn = req.ptr >> PAGE_SHIFT;
+            gpfn = req.val;
+
+            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+
+            if ( unlikely(!get_page_from_mfn(_mfn(mfn), pg_owner)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for mach->phys update\n");
+                rc = -EINVAL;
+                break;
+            }
+
+            set_gpfn_from_mfn(mfn, gpfn);
+
+            paging_mark_dirty(pg_owner, _mfn(mfn));
+
+            put_page(mfn_to_page(_mfn(mfn)));
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(ureqs, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmu_update, "hihi",
+            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmu_update, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    if ( va )
+        unmap_domain_page(va);
+
+    perfc_add(num_page_updates, i);
+
+ out:
+    if ( pt_owner != d )
+        rcu_unlock_domain(pt_owner);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+static int __do_update_va_mapping(unsigned long va, uint64_t val64,
+                                  unsigned long flags, struct domain *pg_owner)
+{
+    l1_pgentry_t   val = l1e_from_intpte(val64);
+    struct vcpu   *curr = current;
+    struct domain *currd = curr->domain;
+    struct page_info *gl1pg;
+    l1_pgentry_t  *pl1e;
+    unsigned long  bmap_ptr;
+    mfn_t          gl1mfn;
+    cpumask_t     *mask = NULL;
+    int            rc;
+
+    perfc_incr(calls_to_update_va);
+
+    rc = xsm_update_va_mapping(XSM_TARGET, currd, pg_owner, val);
+    if ( rc )
+        return rc;
+
+    rc = -EINVAL;
+    pl1e = map_guest_l1e(va, &gl1mfn);
+    if ( unlikely(!pl1e || !get_page_from_mfn(gl1mfn, currd)) )
+        goto out;
+
+    gl1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(gl1pg) )
+    {
+        put_page(gl1pg);
+        goto out;
+    }
+
+    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(gl1pg);
+        put_page(gl1pg);
+        goto out;
+    }
+
+    rc = mod_l1_entry(pl1e, val, mfn_x(gl1mfn), 0, curr, pg_owner);
+
+    page_unlock(gl1pg);
+    put_page(gl1pg);
+
+ out:
+    if ( pl1e )
+        unmap_domain_page(pl1e);
+
+    switch ( flags & UVMF_FLUSHTYPE_MASK )
+    {
+    case UVMF_TLB_FLUSH:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            flush_tlb_local();
+            break;
+        case UVMF_ALL:
+            mask = currd->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(currd,
+                                      const_guest_handle_from_ptr(bmap_ptr, void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_mask(mask);
+        break;
+
+    case UVMF_INVLPG:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            paging_invlpg(curr, va);
+            break;
+        case UVMF_ALL:
+            mask = currd->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(currd,
+                                      const_guest_handle_from_ptr(bmap_ptr, void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
+        break;
+    }
+
+    return rc;
+}
+
+long do_update_va_mapping(unsigned long va, uint64_t val64,
+                          unsigned long flags)
+{
+    return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, uint64_t val64,
+                                      unsigned long flags,
+                                      domid_t domid)
+{
+    struct domain *pg_owner;
+    int rc;
+
+    if ( (pg_owner = get_pg_owner(domid)) == NULL )
+        return -ESRCH;
+
+    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+    put_pg_owner(pg_owner);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v5 23/23] x86/mm: remove the now unused inclusion of pv/mm.h
  2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
                   ` (21 preceding siblings ...)
  2017-09-14 12:58 ` [PATCH v5 22/23] x86/mm: split out PV mm hypercalls to pv/mm-hypercalls.c Wei Liu
@ 2017-09-14 12:58 ` Wei Liu
  22 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 12:58 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2ffcc53c6c..de66a5272c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
-#include "pv/mm.h"
-
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef mfn_to_page
 #define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h
       [not found]   ` <1505408055.662832341@apps.rackspace.com>
@ 2017-09-14 16:58     ` Wei Liu
  0 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-14 16:58 UTC (permalink / raw)
  To: Kent R. Spillner; +Cc: Xen-devel, Wei Liu

Add back xen-devel

On Thu, Sep 14, 2017 at 11:54:15AM -0500, Kent R. Spillner wrote:
> Two minor typos in commit message:
> 
> > Mkae it static inline. It well be used by map_ldt_shadow_page and ro
> 
> s/Mkae/Make/; s/It well/It will/
> 

Thanks. I will fix them.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h
  2017-09-14 12:58 ` [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h Wei Liu
       [not found]   ` <1505408055.662832341@apps.rackspace.com>
@ 2017-09-22 11:36   ` Jan Beulich
  1 sibling, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 11:36 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Mkae it static inline. It well be used by map_ldt_shadow_page and ro

"Make" and "will" and ...

> page fault emulation code later.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 02/23] x86/mm: export get_page_from_mfn
  2017-09-14 12:58 ` [PATCH v5 02/23] x86/mm: export get_page_from_mfn Wei Liu
@ 2017-09-22 11:44   ` Jan Beulich
  2017-09-22 11:51     ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 11:44 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -705,22 +705,6 @@ bool map_ldt_shadow_page(unsigned int offset)
>      return true;
>  }
>  
> -
> -static bool get_page_from_mfn(mfn_t mfn, struct domain *d)
> -{
> -    struct page_info *page = mfn_to_page(mfn);
> -
> -    if ( unlikely(!mfn_valid(mfn)) || unlikely(!get_page(page, d)) )
> -    {
> -        gdprintk(XENLOG_WARNING,
> -                 "Could not get page ref for mfn %"PRI_mfn"\n", mfn_x(mfn));
> -        return false;
> -    }
> -
> -    return true;
> -}
> -
> -
>  static int get_page_and_type_from_mfn(
>      mfn_t mfn, unsigned long type, struct domain *d,
>      int partial, int preemptible)
> diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> index bef45e8e9f..7670912e0a 100644
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -369,6 +369,20 @@ int  get_page_from_l1e(
>      l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
>  void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
>  
> +static inline bool get_page_from_mfn(mfn_t mfn, struct domain *d)
> +{
> +    struct page_info *page = __mfn_to_page(mfn_x(mfn));

Why can this not remain to be mfn_to_page(mfn)? The header has
been introduced only in the immediately preceding patch, so there
can't be users of the header not coping with this. And once other
code includes this header, it should be made type-safe first.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 02/23] x86/mm: export get_page_from_mfn
  2017-09-22 11:44   ` Jan Beulich
@ 2017-09-22 11:51     ` Wei Liu
  2017-09-22 12:11       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-22 11:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 05:44:58AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -705,22 +705,6 @@ bool map_ldt_shadow_page(unsigned int offset)
> >      return true;
> >  }
> >  
> > -
> > -static bool get_page_from_mfn(mfn_t mfn, struct domain *d)
> > -{
> > -    struct page_info *page = mfn_to_page(mfn);
> > -
> > -    if ( unlikely(!mfn_valid(mfn)) || unlikely(!get_page(page, d)) )
> > -    {
> > -        gdprintk(XENLOG_WARNING,
> > -                 "Could not get page ref for mfn %"PRI_mfn"\n", mfn_x(mfn));
> > -        return false;
> > -    }
> > -
> > -    return true;
> > -}
> > -
> > -
> >  static int get_page_and_type_from_mfn(
> >      mfn_t mfn, unsigned long type, struct domain *d,
> >      int partial, int preemptible)
> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
> > index bef45e8e9f..7670912e0a 100644
> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -369,6 +369,20 @@ int  get_page_from_l1e(
> >      l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
> >  void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
> >  
> > +static inline bool get_page_from_mfn(mfn_t mfn, struct domain *d)
> > +{
> > +    struct page_info *page = __mfn_to_page(mfn_x(mfn));
> 
> Why can this not remain to be mfn_to_page(mfn)? The header has
> been introduced only in the immediately preceding patch, so there
> can't be users of the header not coping with this. And once other
> code includes this header, it should be made type-safe first.

I think you misread.  This is asm-x86/mm.h, not pv/mm.h. This header
file has been used in many places. 

Furthermore, this patch introduces no change. If you look at mm.c at the
beginning:

   #undef mfn_to_page
   #define mfn_to_page(mfn) __mfn_to_page(mfn_x(mfn))

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 02/23] x86/mm: export get_page_from_mfn
  2017-09-22 11:51     ` Wei Liu
@ 2017-09-22 12:11       ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:11 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 22.09.17 at 13:51, <wei.liu2@citrix.com> wrote:
> On Fri, Sep 22, 2017 at 05:44:58AM -0600, Jan Beulich wrote:
>> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
>> > --- a/xen/arch/x86/mm.c
>> > +++ b/xen/arch/x86/mm.c
>> > @@ -705,22 +705,6 @@ bool map_ldt_shadow_page(unsigned int offset)
>> >      return true;
>> >  }
>> >  
>> > -
>> > -static bool get_page_from_mfn(mfn_t mfn, struct domain *d)
>> > -{
>> > -    struct page_info *page = mfn_to_page(mfn);
>> > -
>> > -    if ( unlikely(!mfn_valid(mfn)) || unlikely(!get_page(page, d)) )
>> > -    {
>> > -        gdprintk(XENLOG_WARNING,
>> > -                 "Could not get page ref for mfn %"PRI_mfn"\n", 
> mfn_x(mfn));
>> > -        return false;
>> > -    }
>> > -
>> > -    return true;
>> > -}
>> > -
>> > -
>> >  static int get_page_and_type_from_mfn(
>> >      mfn_t mfn, unsigned long type, struct domain *d,
>> >      int partial, int preemptible)
>> > diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
>> > index bef45e8e9f..7670912e0a 100644
>> > --- a/xen/include/asm-x86/mm.h
>> > +++ b/xen/include/asm-x86/mm.h
>> > @@ -369,6 +369,20 @@ int  get_page_from_l1e(
>> >      l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
>> >  void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
>> >  
>> > +static inline bool get_page_from_mfn(mfn_t mfn, struct domain *d)
>> > +{
>> > +    struct page_info *page = __mfn_to_page(mfn_x(mfn));
>> 
>> Why can this not remain to be mfn_to_page(mfn)? The header has
>> been introduced only in the immediately preceding patch, so there
>> can't be users of the header not coping with this. And once other
>> code includes this header, it should be made type-safe first.
> 
> I think you misread.  This is asm-x86/mm.h, not pv/mm.h.

Oh, indeed. I'm sorry.

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h
  2017-09-14 12:58 ` [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h Wei Liu
@ 2017-09-22 12:32   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:32 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/pv/mm.h
> +++ b/xen/arch/x86/pv/mm.h
> @@ -18,4 +18,69 @@ static inline l1_pgentry_t guest_get_eff_l1e(unsigned long linear)
>      return l1e;
>  }
>  
> +/*
> + * PTE updates can be done with ordinary writes except:
> + *  1. Debug builds get extra checking by using CMPXCHG[8B].
> + */
> +#if !defined(NDEBUG)

#ifdef

> +#define PTE_UPDATE_WITH_CMPXCHG

#else
#undef

now that it sits in a header, to be on the safe side.

With that

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e to pv/mm.h
  2017-09-14 12:58 ` [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e " Wei Liu
@ 2017-09-22 12:33   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:33 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 05/23] x86/mm: move ro page fault emulation code
  2017-09-14 12:58 ` [PATCH v5 05/23] x86/mm: move ro page fault emulation code Wei Liu
@ 2017-09-22 12:37   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:37 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> --- /dev/null
> +++ b/xen/include/asm-x86/pv/mm.h
> @@ -0,0 +1,38 @@
> +/*
> + * asm-x86/pv/mm.h
> + *
> + * Memory management interfaces for PV guests
> + *
> + * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __X86_PV_MM_H__
> +#define __X86_PV_MM_H__
> +
> +#ifdef CONFIG_PV
> +
> +int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
> +
> +#else
> +
> +static inline int pv_ro_page_fault(unsigned long addr,
> +                                   struct cpu_user_regs *regs)
> +{

ASSERT_UNREACHABLE()? In any event
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h
  2017-09-14 12:58 ` [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h Wei Liu
@ 2017-09-22 12:37   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:37 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c
  2017-09-14 12:58 ` [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c Wei Liu
@ 2017-09-22 12:58   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:58 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> And export the function via pv/mm.h.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

I'm not overly happy with the function becoming non-static, but I
can see that it'll be better to move grant table stuff and update-va-
mapping into different files, so
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 08/23] x86/mm: split out pv grant table code
  2017-09-14 12:58 ` [PATCH v5 08/23] x86/mm: split out pv grant table code Wei Liu
@ 2017-09-22 12:59   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 12:59 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Move the code to pv/grant_table.c. Nothing needs to be done with
> regard to headers.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt
  2017-09-14 12:58 ` [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt Wei Liu
@ 2017-09-22 13:02   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:02 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> --- a/xen/include/asm-x86/pv/mm.h
> +++ b/xen/include/asm-x86/pv/mm.h
> @@ -25,14 +25,24 @@
>  
>  int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs);
>  
> +long pv_set_gdt(struct vcpu *d, unsigned long *frames, unsigned int entries);
> +void pv_destroy_gdt(struct vcpu *d);
> +
>  #else
>  
> +#include <xen/errno.h>
> +
>  static inline int pv_ro_page_fault(unsigned long addr,
>                                     struct cpu_user_regs *regs)
>  {
>      return 0;
>  }
>  
> +static inline long pv_set_gdt(struct vcpu *d, unsigned long *frames,
> +                              unsigned int entries)
> +{ return -EINVAL; }
> +static inline void pv_destroy_gdt(struct vcpu *d) {}

Please everywhere here switch the parameter names from d to v.
With that and again maybe ASSERT_UNREACHABLE() added to the
stubs
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code
  2017-09-14 12:58 ` [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code Wei Liu
@ 2017-09-22 13:07   ` Jan Beulich
  2017-09-22 13:12     ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:07 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Move the code to pv/descriptor-tables.c. Change u64 to uint64_t while
> moving. Use currd in do_update_descriptor.

Hmm, so the "later" in patch 9 isn't in a future series, but here.
Why couldn't the move and rename then be done in one step?
Anyway ...

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 11/23] x86/mm: move compat descriptor table manipulation code
  2017-09-14 12:58 ` [PATCH v5 11/23] x86/mm: move compat " Wei Liu
@ 2017-09-22 13:09   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:09 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> --- a/xen/arch/x86/pv/descriptor-tables.c
> +++ b/xen/arch/x86/pv/descriptor-tables.c
> @@ -181,6 +181,46 @@ long do_update_descriptor(uint64_t pa, uint64_t desc)
>      return ret;
>  }
>  
> +int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
> +{
> +    unsigned int i, nr_pages = (entries + 511) / 512;
> +    unsigned long frames[16];
> +    long ret;

Considering the function returns int, how about changing this to int
as you go? Either way

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code
  2017-09-22 13:07   ` Jan Beulich
@ 2017-09-22 13:12     ` Wei Liu
  0 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-22 13:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 07:07:29AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > Move the code to pv/descriptor-tables.c. Change u64 to uint64_t while
> > moving. Use currd in do_update_descriptor.
> 
> Hmm, so the "later" in patch 9 isn't in a future series, but here.
> Why couldn't the move and rename then be done in one step?

Because I thought it would be easier to review if I could keep the
patches as self-contained as possible, i.e. only do one or two closely
related things if I could.

Overall I think this is a better strategy to reduce cognitive burden for
the reviewers.

Anyway, thanks for your review.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page
  2017-09-14 12:58 ` [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page Wei Liu
@ 2017-09-22 13:18   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:18 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Add pv prefix to it. Move it to pv/mm.c. Fix call sites.
> 
> Take the chance to change v to curr and d to currd.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory
  2017-09-14 12:58 ` [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory Wei Liu
@ 2017-09-22 13:20   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:20 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> Move the split l4 setup code into the new function. It can then be
> moved to pv/ in a later patch.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 14/23] x86/mm: move PV l4 table setup code
  2017-09-14 12:58 ` [PATCH v5 14/23] x86/mm: move PV l4 table setup code Wei Liu
@ 2017-09-22 13:23   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:23 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> @@ -32,6 +33,14 @@
>  #undef page_to_mfn
>  #define page_to_mfn(pg) _mfn(__page_to_mfn(pg))
>  
> +#ifndef NDEBUG
> +static unsigned int __read_mostly root_pgt_pv_xen_slots
> +    = ROOT_PAGETABLE_PV_XEN_SLOTS;
> +static l4_pgentry_t __read_mostly split_l4e;
> +#else
> +#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
> +#endif
> +
>  /*
>   * Get a mapping of a PV guest's l1e for this linear address.  The return
>   * pointer should be unmapped using unmap_domain_page().
> @@ -133,6 +142,79 @@ bool pv_map_ldt_shadow_page(unsigned int offset)
>      return true;
>  }
>  
> +/*
> + * This function must write all ROOT_PAGETABLE_PV_XEN_SLOTS, to clobber any
> + * values a guest may have left there from alloc_l4_table().
> + */
> +void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
> +                         bool zap_ro_mpt)
> +{

Please move the static variables down right before this function,
so that what belongs together stays together. With that
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h
  2017-09-14 12:58 ` [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h Wei Liu
@ 2017-09-22 13:23   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:23 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> It is only used by PV. The code can only be moved together with other
> PV mm code.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type
  2017-09-14 12:58 ` [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type Wei Liu
@ 2017-09-22 13:40   ` Jan Beulich
  2017-09-22 14:07     ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:40 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> And move the declarations to pv/mm.h. The code will be moved later.
> 
> The stubs contain BUG() because they aren't supposed to be called when
> PV is disabled.

I'd prefer ASSERT_UNREACHABLE() - they return proper errors
after all, and there's no need to bring down a production system.
Additionally could you add (half) a sentence regarding the
PGT_l*_page_table uses outside of PV specific code, which I'm
sure you have verified can't make it into the stubs?

> --- a/xen/include/asm-x86/pv/mm.h
> +++ b/xen/include/asm-x86/pv/mm.h
> @@ -32,6 +32,11 @@ bool pv_map_ldt_shadow_page(unsigned int off);
>  
>  void pv_arch_init_memory(void);
>  
> +int pv_alloc_page_type(struct page_info *page, unsigned long type,
> +                       int preemptible);
> +int pv_free_page_type(struct page_info *page, unsigned long type,
> +                      int preemptible);
> +
>  #else
>  
>  #include <xen/errno.h>
> @@ -51,6 +56,13 @@ static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
>  
>  static inline void pv_arch_init_memory(void) {}
>  
> +static inline int pv_alloc_page_type(struct page_info *page, unsigned long type,
> +                                     int preemptible)
> +{ BUG(); return -EINVAL; }
> +static inline int pv_free_page_type(struct page_info *page, unsigned long type,
> +                                    int preemptible)
> +{ BUG(); return -EINVAL; }

Take the opportunity and make all the "preemptible" parameters bool?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h
  2017-09-14 12:58 ` [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h Wei Liu
@ 2017-09-22 13:52   ` Jan Beulich
  2017-09-22 15:52     ` Wei Liu
  2017-09-23 16:52     ` Tim Deegan
  0 siblings, 2 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 13:52 UTC (permalink / raw)
  To: WeiLiu, Tim Deegan; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> The l1 mask needs to stay in x86/mm.c while l{2,3,4} masks are only
> needed by PV code. Both x86 common mm code and PV mm code use
> base_disallow_mask and l1 maks.
> 
> Export base_disallow_mask and l1 mask in asm-x86/mm.h.

So that's because in patch 20 you need to keep
get_page_from_l1e() in x86/mm.c, due to being used by shadow
code. But is shadow using the same disallow mask for HVM guests
actually correct? Perhaps it would be better for callers of
get_page_from_l1e() to pass in their disallow masks, even if it
would just so happen that PV and shadow use the same ones?
Tim, do you have any thoughts or insights here?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type
  2017-09-22 13:40   ` Jan Beulich
@ 2017-09-22 14:07     ` Wei Liu
  2017-09-22 14:24       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-22 14:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 07:40:04AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > And move the declarations to pv/mm.h. The code will be moved later.
> > 
> > The stubs contain BUG() because they aren't supposed to be called when
> > PV is disabled.
> 
> I'd prefer ASSERT_UNREACHABLE() - they return proper errors

ASSERT_UNREACHABLE() -- no problem.

> after all, and there's no need to bring down a production system.
> Additionally could you add (half) a sentence regarding the
> PGT_l*_page_table uses outside of PV specific code, which I'm
> sure you have verified can't make it into the stubs?

At this stage I can only verify it by reading the code. I can't turn off
PV yet.

To allocate a PGT_l*_page_table type page, the guest must explicitly
request such types via PV MMU hypercall; there is no code other than the
PV dom0 builder and p2m_alloc_ptp  in the hypervisor would explicitly
ask for PGT_l*_page_table type pages.

p2m_alloc_table is a bit tricky. I think it can get away with not using
PGT_l*_page_table type pages, but that is work for another day.  Anyway,
currently it frees the pages directly with free_page from different
paging modes, all of which won't go into PV mm code.

So my conclusion by reading the code is non-PV code can't make it to the
stubs

> 
> > --- a/xen/include/asm-x86/pv/mm.h
> > +++ b/xen/include/asm-x86/pv/mm.h
> > @@ -32,6 +32,11 @@ bool pv_map_ldt_shadow_page(unsigned int off);
> >  
> >  void pv_arch_init_memory(void);
> >  
> > +int pv_alloc_page_type(struct page_info *page, unsigned long type,
> > +                       int preemptible);
> > +int pv_free_page_type(struct page_info *page, unsigned long type,
> > +                      int preemptible);
> > +
> >  #else
> >  
> >  #include <xen/errno.h>
> > @@ -51,6 +56,13 @@ static inline bool pv_map_ldt_shadow_page(unsigned int off) { return false; }
> >  
> >  static inline void pv_arch_init_memory(void) {}
> >  
> > +static inline int pv_alloc_page_type(struct page_info *page, unsigned long type,
> > +                                     int preemptible)
> > +{ BUG(); return -EINVAL; }
> > +static inline int pv_free_page_type(struct page_info *page, unsigned long type,
> > +                                    int preemptible)
> > +{ BUG(); return -EINVAL; }
> 
> Take the opportunity and make all the "preemptible" parameters bool?

No problem.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type
  2017-09-22 14:07     ` Wei Liu
@ 2017-09-22 14:24       ` Jan Beulich
  2017-09-22 14:34         ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 14:24 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 22.09.17 at 16:07, <wei.liu2@citrix.com> wrote:
> On Fri, Sep 22, 2017 at 07:40:04AM -0600, Jan Beulich wrote:
>> Additionally could you add (half) a sentence regarding the
>> PGT_l*_page_table uses outside of PV specific code, which I'm
>> sure you have verified can't make it into the stubs?
> 
> At this stage I can only verify it by reading the code. I can't turn off
> PV yet.

Of course, that's what I did mean.

> To allocate a PGT_l*_page_table type page, the guest must explicitly
> request such types via PV MMU hypercall; there is no code other than the
> PV dom0 builder and p2m_alloc_ptp  in the hypervisor would explicitly
> ask for PGT_l*_page_table type pages.
> 
> p2m_alloc_table is a bit tricky. I think it can get away with not using
> PGT_l*_page_table type pages, but that is work for another day.  Anyway,
> currently it frees the pages directly with free_page from different
> paging modes, all of which won't go into PV mm code.

Right, hence the request to extend the description a little.
shadow_enable() also has a use that's not so obviously not
a problem.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type
  2017-09-22 14:24       ` Jan Beulich
@ 2017-09-22 14:34         ` Wei Liu
  0 siblings, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-22 14:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 08:24:20AM -0600, Jan Beulich wrote:
> >>> On 22.09.17 at 16:07, <wei.liu2@citrix.com> wrote:
> > On Fri, Sep 22, 2017 at 07:40:04AM -0600, Jan Beulich wrote:
> >> Additionally could you add (half) a sentence regarding the
> >> PGT_l*_page_table uses outside of PV specific code, which I'm
> >> sure you have verified can't make it into the stubs?
> > 
> > At this stage I can only verify it by reading the code. I can't turn off
> > PV yet.
> 
> Of course, that's what I did mean.
> 
> > To allocate a PGT_l*_page_table type page, the guest must explicitly
> > request such types via PV MMU hypercall; there is no code other than the
> > PV dom0 builder and p2m_alloc_ptp  in the hypervisor would explicitly
> > ask for PGT_l*_page_table type pages.
> > 
> > p2m_alloc_table is a bit tricky. I think it can get away with not using
> > PGT_l*_page_table type pages, but that is work for another day.  Anyway,
> > currently it frees the pages directly with free_page from different
> > paging modes, all of which won't go into PV mm code.
> 
> Right, hence the request to extend the description a little.
> shadow_enable() also has a use that's not so obviously not
> a problem.

The call chain is paging_enable -> shadow_enable. paging_enable is only
called by hvm code.

The teardown in shadow_final_teardown is also done with p2m_teardown
which goes to free_page function in respective paging modes.

All in all, it is not a problem in that code path either.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-14 12:58 ` [PATCH v5 18/23] x86/mm: export some stuff via local mm.h Wei Liu
@ 2017-09-22 15:44   ` Jan Beulich
  2017-09-22 15:56     ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 15:44 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> They will be used by PV mm code and mm hypercall code, which is going
> to be split into two files.

Hmm, no, I think I'd prefer mod_lN_entry() and
{get,put}_page_from_lNe() to stay in the same file (with the
possible exception of {get,put}_page_from_l1e(), as under
discussion elsewhere).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h
  2017-09-14 12:58 ` [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h Wei Liu
@ 2017-09-22 15:49   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 15:49 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> It is going to be needed by common x86 mm code and pv mm code.

The sole callers are alloc_page_type() and __put_final_page_type(),
and their respective code paths should be PV only. I'd also prefer if
the compiler remained to be able to decide whether it wants to inline
the function.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h
  2017-09-22 13:52   ` Jan Beulich
@ 2017-09-22 15:52     ` Wei Liu
  2017-09-23 16:52     ` Tim Deegan
  1 sibling, 0 replies; 57+ messages in thread
From: Wei Liu @ 2017-09-22 15:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Tim Deegan, WeiLiu, Xen-devel, Andrew Cooper

On Fri, Sep 22, 2017 at 07:52:13AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > The l1 mask needs to stay in x86/mm.c while l{2,3,4} masks are only
> > needed by PV code. Both x86 common mm code and PV mm code use
> > base_disallow_mask and l1 maks.
> > 
> > Export base_disallow_mask and l1 mask in asm-x86/mm.h.
> 
> So that's because in patch 20 you need to keep
> get_page_from_l1e() in x86/mm.c, due to being used by shadow
> code. But is shadow using the same disallow mask for HVM guests
> actually correct? Perhaps it would be better for callers of
> get_page_from_l1e() to pass in their disallow masks, even if it
> would just so happen that PV and shadow use the same ones?

I think this is a fine idea, fwiw.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c
  2017-09-14 12:58 ` [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c Wei Liu
@ 2017-09-22 15:53   ` Jan Beulich
  0 siblings, 0 replies; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 15:53 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> A load of functions are moved:
> 
> 1. {get,put}_page_from_l{2,3,4}e

As just indicated in reply to another patch, I'd really hope for these
to stay together with mod_lN_entry().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-22 15:44   ` Jan Beulich
@ 2017-09-22 15:56     ` Wei Liu
  2017-09-22 16:00       ` Jan Beulich
  0 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-22 15:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 09:44:53AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > They will be used by PV mm code and mm hypercall code, which is going
> > to be split into two files.
> 
> Hmm, no, I think I'd prefer mod_lN_entry() and
> {get,put}_page_from_lNe() to stay in the same file (with the
> possible exception of {get,put}_page_from_l1e(), as under
> discussion elsewhere).
> 

Seeing that mod_lN_entry's are only used by PV hypercall code I opted to
split them to mm-hypercalls.c in later file.

Now you want to put {get,put}_page_from_lNe and mod_lN_entry in the same
file, there are two choices:

1. Merge mm-hypercalls.c into pv/mm.c.
2. Leave mod_*_entry in pv/mm.c and keep mm-hypercalls.c -- this would
   require exporting mod_*_entry via local header file.

Which one do you prefer?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-22 15:56     ` Wei Liu
@ 2017-09-22 16:00       ` Jan Beulich
  2017-09-22 16:07         ` Wei Liu
  0 siblings, 1 reply; 57+ messages in thread
From: Jan Beulich @ 2017-09-22 16:00 UTC (permalink / raw)
  To: WeiLiu; +Cc: George Dunlap, Andrew Cooper, Xen-devel

>>> On 22.09.17 at 17:56, <wei.liu2@citrix.com> wrote:
> On Fri, Sep 22, 2017 at 09:44:53AM -0600, Jan Beulich wrote:
>> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
>> > They will be used by PV mm code and mm hypercall code, which is going
>> > to be split into two files.
>> 
>> Hmm, no, I think I'd prefer mod_lN_entry() and
>> {get,put}_page_from_lNe() to stay in the same file (with the
>> possible exception of {get,put}_page_from_l1e(), as under
>> discussion elsewhere).
>> 
> 
> Seeing that mod_lN_entry's are only used by PV hypercall code I opted to
> split them to mm-hypercalls.c in later file.
> 
> Now you want to put {get,put}_page_from_lNe and mod_lN_entry in the same
> file, there are two choices:
> 
> 1. Merge mm-hypercalls.c into pv/mm.c.
> 2. Leave mod_*_entry in pv/mm.c and keep mm-hypercalls.c -- this would
>    require exporting mod_*_entry via local header file.
> 
> Which one do you prefer?

I think I'd prefer 1 over 2, but please give Andrew a chance to
voice his opinion too.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-22 16:00       ` Jan Beulich
@ 2017-09-22 16:07         ` Wei Liu
  2017-09-22 16:09           ` Andrew Cooper
  0 siblings, 1 reply; 57+ messages in thread
From: Wei Liu @ 2017-09-22 16:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

On Fri, Sep 22, 2017 at 10:00:20AM -0600, Jan Beulich wrote:
> >>> On 22.09.17 at 17:56, <wei.liu2@citrix.com> wrote:
> > On Fri, Sep 22, 2017 at 09:44:53AM -0600, Jan Beulich wrote:
> >> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> >> > They will be used by PV mm code and mm hypercall code, which is going
> >> > to be split into two files.
> >> 
> >> Hmm, no, I think I'd prefer mod_lN_entry() and
> >> {get,put}_page_from_lNe() to stay in the same file (with the
> >> possible exception of {get,put}_page_from_l1e(), as under
> >> discussion elsewhere).
> >> 
> > 
> > Seeing that mod_lN_entry's are only used by PV hypercall code I opted to
> > split them to mm-hypercalls.c in later file.
> > 
> > Now you want to put {get,put}_page_from_lNe and mod_lN_entry in the same
> > file, there are two choices:
> > 
> > 1. Merge mm-hypercalls.c into pv/mm.c.
> > 2. Leave mod_*_entry in pv/mm.c and keep mm-hypercalls.c -- this would
> >    require exporting mod_*_entry via local header file.
> > 
> > Which one do you prefer?
> 
> I think I'd prefer 1 over 2, but please give Andrew a chance to
> voice his opinion too.
> 

Definitely.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 18/23] x86/mm: export some stuff via local mm.h
  2017-09-22 16:07         ` Wei Liu
@ 2017-09-22 16:09           ` Andrew Cooper
  0 siblings, 0 replies; 57+ messages in thread
From: Andrew Cooper @ 2017-09-22 16:09 UTC (permalink / raw)
  To: Wei Liu, Jan Beulich; +Cc: George Dunlap, Xen-devel

On 22/09/17 17:07, Wei Liu wrote:
> On Fri, Sep 22, 2017 at 10:00:20AM -0600, Jan Beulich wrote:
>>>>> On 22.09.17 at 17:56, <wei.liu2@citrix.com> wrote:
>>> On Fri, Sep 22, 2017 at 09:44:53AM -0600, Jan Beulich wrote:
>>>>>>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
>>>>> They will be used by PV mm code and mm hypercall code, which is going
>>>>> to be split into two files.
>>>> Hmm, no, I think I'd prefer mod_lN_entry() and
>>>> {get,put}_page_from_lNe() to stay in the same file (with the
>>>> possible exception of {get,put}_page_from_l1e(), as under
>>>> discussion elsewhere).
>>>>
>>> Seeing that mod_lN_entry's are only used by PV hypercall code I opted to
>>> split them to mm-hypercalls.c in later file.
>>>
>>> Now you want to put {get,put}_page_from_lNe and mod_lN_entry in the same
>>> file, there are two choices:
>>>
>>> 1. Merge mm-hypercalls.c into pv/mm.c.
>>> 2. Leave mod_*_entry in pv/mm.c and keep mm-hypercalls.c -- this would
>>>     require exporting mod_*_entry via local header file.
>>>
>>> Which one do you prefer?
>> I think I'd prefer 1 over 2, but please give Andrew a chance to
>> voice his opinion too.
>>
> Definitely.

I'd err on the side of 1) over 2) in this case.  I think it is more 
important to not have these functions exported, than to minimize the 
size of each translation unit.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h
  2017-09-22 13:52   ` Jan Beulich
  2017-09-22 15:52     ` Wei Liu
@ 2017-09-23 16:52     ` Tim Deegan
  1 sibling, 0 replies; 57+ messages in thread
From: Tim Deegan @ 2017-09-23 16:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, WeiLiu, Xen-devel

At 07:52 -0600 on 22 Sep (1506066733), Jan Beulich wrote:
> >>> On 14.09.17 at 14:58, <wei.liu2@citrix.com> wrote:
> > The l1 mask needs to stay in x86/mm.c while l{2,3,4} masks are only
> > needed by PV code. Both x86 common mm code and PV mm code use
> > base_disallow_mask and l1 maks.
> > 
> > Export base_disallow_mask and l1 mask in asm-x86/mm.h.
> 
> So that's because in patch 20 you need to keep
> get_page_from_l1e() in x86/mm.c, due to being used by shadow
> code. But is shadow using the same disallow mask for HVM guests
> actually correct? Perhaps it would be better for callers of
> get_page_from_l1e() to pass in their disallow masks, even if it
> would just so happen that PV and shadow use the same ones?
> Tim, do you have any thoughts or insights here?

IIRC the shadow code isn't relying on the disallow mask for anything
in HVM guests; it's built the shadows according to its own rules.

I don't think that anything's improved by making the disallow mask
explicit but I have no objection to it if it makes other refactoring
simpler.

Tim.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2017-09-23 16:52 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-14 12:58 [PATCH v5 00/23] x86: refactor mm.c Wei Liu
2017-09-14 12:58 ` [PATCH v5 01/23] x86/mm: move guest_get_eff_l1e to pv/mm.h Wei Liu
     [not found]   ` <1505408055.662832341@apps.rackspace.com>
2017-09-14 16:58     ` Wei Liu
2017-09-22 11:36   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 02/23] x86/mm: export get_page_from_mfn Wei Liu
2017-09-22 11:44   ` Jan Beulich
2017-09-22 11:51     ` Wei Liu
2017-09-22 12:11       ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 03/23] x86/mm: move update_intpte to pv/mm.h Wei Liu
2017-09-22 12:32   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 04/23] x86/mm: move {un, }adjust_guest_l*e " Wei Liu
2017-09-22 12:33   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 05/23] x86/mm: move ro page fault emulation code Wei Liu
2017-09-22 12:37   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 06/23] x86/mm: remove the now unused inclusion of pv/emulate.h Wei Liu
2017-09-22 12:37   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 07/23] x86/mm: move map_guest_l1e to pv/mm.c Wei Liu
2017-09-22 12:58   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 08/23] x86/mm: split out pv grant table code Wei Liu
2017-09-22 12:59   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 09/23] x86/mm: add pv prefix to {set, destroy}_gdt Wei Liu
2017-09-22 13:02   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 10/23] x86/mm: split out descriptor table manipulation code Wei Liu
2017-09-22 13:07   ` Jan Beulich
2017-09-22 13:12     ` Wei Liu
2017-09-14 12:58 ` [PATCH v5 11/23] x86/mm: move compat " Wei Liu
2017-09-22 13:09   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 12/23] x86/mm: move and rename map_ldt_shadow_page Wei Liu
2017-09-22 13:18   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 13/23] x86/mm: factor out pv_arch_init_memory Wei Liu
2017-09-22 13:20   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 14/23] x86/mm: move PV l4 table setup code Wei Liu
2017-09-22 13:23   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 15/23] x86/mm: move declaration of new_guest_cr3 to local pv/mm.h Wei Liu
2017-09-22 13:23   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 16/23] x86/mm: add pv prefix to {alloc, free}_page_type Wei Liu
2017-09-22 13:40   ` Jan Beulich
2017-09-22 14:07     ` Wei Liu
2017-09-22 14:24       ` Jan Beulich
2017-09-22 14:34         ` Wei Liu
2017-09-14 12:58 ` [PATCH v5 17/23] x86/mm: export base_disallow_mask and l1 mask in asm-x86/mm.h Wei Liu
2017-09-22 13:52   ` Jan Beulich
2017-09-22 15:52     ` Wei Liu
2017-09-23 16:52     ` Tim Deegan
2017-09-14 12:58 ` [PATCH v5 18/23] x86/mm: export some stuff via local mm.h Wei Liu
2017-09-22 15:44   ` Jan Beulich
2017-09-22 15:56     ` Wei Liu
2017-09-22 16:00       ` Jan Beulich
2017-09-22 16:07         ` Wei Liu
2017-09-22 16:09           ` Andrew Cooper
2017-09-14 12:58 ` [PATCH v5 19/23] x86/mm: export get_page_light via asm-x86/mm.h Wei Liu
2017-09-22 15:49   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 20/23] x86/mm: split out PV mm code to pv/mm.c Wei Liu
2017-09-22 15:53   ` Jan Beulich
2017-09-14 12:58 ` [PATCH v5 21/23] x86/mm: move and add pv prefix to invalidate_shadow_ldt Wei Liu
2017-09-14 12:58 ` [PATCH v5 22/23] x86/mm: split out PV mm hypercalls to pv/mm-hypercalls.c Wei Liu
2017-09-14 12:58 ` [PATCH v5 23/23] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.