All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 00/13] Refactor x86/mm.c
@ 2017-03-27  9:10 Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner Wei Liu
                   ` (12 more replies)
  0 siblings, 13 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

This series splits x86/mm.c into PV and HVM specific files.

The long term goal is to have clear separation of guest supporting
code so that we can disable modes as we see fit. This would also help
a later project to move PV interface into a PVH container.

I think if I try harder some of the page manipulation code can be
split further. But I started this series with the intent to move three
PV only hypercalls into pv/mm.c and now it has grown into a long
series. I would rather leave further splitting to another day.

RFC at the moment as it still awaits testing. And I would like some
opinions before I spend more effort on this.

Wei.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Tim Deegan <tim@xen.org>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

Wei Liu (13):
  x86/mm: export {get,put}_pg_owner
  x86/mm: move MEM_LOG to asm-x86/mm.h
  x86/mm: export vcpumask_to_pcpumask
  x86/mm: carve out create_grant_pv_mapping
  x86/mm: carve out replace_grant_pv_mapping
  x86/mm: extract page table masks to mm.h
  x86/mm: extract PAGE_CACHE_ATTRS to mm.h
  x86/mm: extract adjust_guest_l*e macros to mm.h
  x86/mm: export a bunch of {get,put}_page functions
  x86/mm: export invalidate_shadow_ldt
  x86/mm: export create_pae_xen_mappings
  x86/mm: split PV MMU code to pv/mm.c
  x86/mm: split HVM grant table code to hvm/mm.c

 xen/arch/x86/hvm/Makefile |    1 +
 xen/arch/x86/hvm/mm.c     |   84 ++
 xen/arch/x86/mm.c         | 2095 ++-------------------------------------------
 xen/arch/x86/pv/Makefile  |    1 +
 xen/arch/x86/pv/mm.c      | 1902 ++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h  |  108 +++
 6 files changed, 2158 insertions(+), 2033 deletions(-)
 create mode 100644 xen/arch/x86/hvm/mm.c
 create mode 100644 xen/arch/x86/pv/mm.c

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-28 21:11   ` [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner Andrew Cooper
  2017-03-27  9:10 ` [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h Wei Liu
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Prefix them with "mm_" and add declarations to asm-x86/mm.h.

They will be needed when we split PV specific code out of x86/mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 20 ++++++++++----------
 xen/include/asm-x86/mm.h |  4 ++++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4dbd24ff87..c0bdc667bf 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3025,7 +3025,7 @@ int new_guest_cr3(unsigned long mfn)
     return rc;
 }
 
-static struct domain *get_pg_owner(domid_t domid)
+struct domain *mm_get_pg_owner(domid_t domid)
 {
     struct domain *pg_owner = NULL, *curr = current->domain;
 
@@ -3068,7 +3068,7 @@ static struct domain *get_pg_owner(domid_t domid)
     return pg_owner;
 }
 
-static void put_pg_owner(struct domain *pg_owner)
+void mm_put_pg_owner(struct domain *pg_owner)
 {
     rcu_unlock_domain(pg_owner);
 }
@@ -3153,19 +3153,19 @@ long do_mmuext_op(
     if ( unlikely(!guest_handle_okay(uops, count)) )
         return -EFAULT;
 
-    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
+    if ( (pg_owner = mm_get_pg_owner(foreigndom)) == NULL )
         return -ESRCH;
 
     if ( !is_pv_domain(pg_owner) )
     {
-        put_pg_owner(pg_owner);
+        mm_put_pg_owner(pg_owner);
         return -EINVAL;
     }
 
     rc = xsm_mmuext_op(XSM_TARGET, d, pg_owner);
     if ( rc )
     {
-        put_pg_owner(pg_owner);
+        mm_put_pg_owner(pg_owner);
         return rc;
     }
 
@@ -3640,7 +3640,7 @@ long do_mmuext_op(
                 MMU_UPDATE_PREEMPTED, null, rc);
     }
 
-    put_pg_owner(pg_owner);
+    mm_put_pg_owner(pg_owner);
 
     perfc_add(num_mmuext_ops, i);
 
@@ -3716,7 +3716,7 @@ long do_mmu_update(
         }
     }
 
-    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
+    if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL )
     {
         rc = -ESRCH;
         goto out;
@@ -3951,7 +3951,7 @@ long do_mmu_update(
                 MMU_UPDATE_PREEMPTED, null, rc);
     }
 
-    put_pg_owner(pg_owner);
+    mm_put_pg_owner(pg_owner);
 
     domain_mmap_cache_destroy(&mapcache);
 
@@ -4571,12 +4571,12 @@ long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
     struct domain *pg_owner;
     int rc;
 
-    if ( (pg_owner = get_pg_owner(domid)) == NULL )
+    if ( (pg_owner = mm_get_pg_owner(domid)) == NULL )
         return -ESRCH;
 
     rc = __do_update_va_mapping(va, val64, flags, pg_owner);
 
-    put_pg_owner(pg_owner);
+    mm_put_pg_owner(pg_owner);
 
     return rc;
 }
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index e22603ce98..e2baffa9a4 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -602,4 +602,8 @@ extern const char zero_page[];
 /* Build a 32bit PSE page table using 4MB pages. */
 void write_32bit_pse_identmap(uint32_t *l2);
 
+/* Grab the domain lock of target domain after passing checks */
+struct domain *mm_get_pg_owner(domid_t domid);
+void mm_put_pg_owner(struct domain *pg_owner);
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-28 21:14   ` Andrew Cooper
  2017-03-27  9:10 ` [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask Wei Liu
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

It will be needed later when we split mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 --
 xen/include/asm-x86/mm.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c0bdc667bf..a1d867bd94 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-#define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
-
 /*
  * PTE updates can be done with ordinary writes except:
  *  1. Debug builds get extra checking by using CMPXCHG[8B].
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index e2baffa9a4..6bc9866359 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -606,4 +606,6 @@ void write_32bit_pse_identmap(uint32_t *l2);
 struct domain *mm_get_pg_owner(domid_t domid);
 void mm_put_pg_owner(struct domain *pg_owner);
 
+#define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-29 10:15   ` Andrew Cooper
  2017-03-27  9:10 ` [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping Wei Liu
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Prefix it with "mm_". No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
It used to be a inline function, but I don't think there is no point in
making it inline because PV mmu operations are already expensive. Having
a callq to another function won't make noticeable difference.

It is not annotated as always_inline, so the compiler might not inline
it.  Gcc 6.3 inlines it, but clang 4 doesn't.

Let me know if you want me to make it an inline function.
---
 xen/arch/x86/mm.c        | 17 ++++++++++-------
 xen/include/asm-x86/mm.h |  4 ++++
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a1d867bd94..4bf94227cc 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3071,8 +3071,9 @@ void mm_put_pg_owner(struct domain *pg_owner)
     rcu_unlock_domain(pg_owner);
 }
 
-static inline int vcpumask_to_pcpumask(
-    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
+int mm_vcpumask_to_pcpumask(struct domain *d,
+                            XEN_GUEST_HANDLE_PARAM(const_void) bmap,
+                            cpumask_t *pmask)
 {
     unsigned int vcpu_id, vcpu_bias, offs;
     unsigned long vmask;
@@ -3422,7 +3423,7 @@ long do_mmuext_op(
 
             if ( unlikely(d != pg_owner) )
                 rc = -EPERM;
-            else if ( unlikely(vcpumask_to_pcpumask(d,
+            else if ( unlikely(mm_vcpumask_to_pcpumask(d,
                                    guest_handle_to_param(op.arg2.vcpumask,
                                                          const_void),
                                    mask)) )
@@ -4523,9 +4524,10 @@ static int __do_update_va_mapping(
             break;
         default:
             mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+            rc = mm_vcpumask_to_pcpumask(d,
+                                         const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      mask);
+                                         mask);
             break;
         }
         if ( mask )
@@ -4543,9 +4545,10 @@ static int __do_update_va_mapping(
             break;
         default:
             mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
+            rc = mm_vcpumask_to_pcpumask(d,
+                                         const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
-                                      mask);
+                                         mask);
             break;
         }
         if ( mask )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 6bc9866359..2bcf5514e9 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -606,6 +606,10 @@ void write_32bit_pse_identmap(uint32_t *l2);
 struct domain *mm_get_pg_owner(domid_t domid);
 void mm_put_pg_owner(struct domain *pg_owner);
 
+int mm_vcpumask_to_pcpumask(struct domain *d,
+                            XEN_GUEST_HANDLE_PARAM(const_void) bmap,
+                            cpumask_t *pmask);
+
 #define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
 
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (2 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-29 10:27   ` Andrew Cooper
  2017-03-27  9:10 ` [PATCH RFC 05/13] x86/mm: carve out replace_grant_pv_mapping Wei Liu
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

We will later split out PV specific code to pv/mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4bf94227cc..b44d44c782 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4242,15 +4242,12 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
         return GNTST_okay;
 }
 
-int create_grant_host_mapping(uint64_t addr, unsigned long frame, 
-                              unsigned int flags, unsigned int cache_flags)
+static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                                   unsigned int flags, unsigned int cache_flags)
 {
     l1_pgentry_t pte;
     uint32_t grant_pte_flags;
 
-    if ( paging_mode_external(current->domain) )
-        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
-
     grant_pte_flags =
         _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
     if ( cpu_has_nx )
@@ -4273,6 +4270,15 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
     return create_grant_va_mapping(addr, pte, current);
 }
 
+int create_grant_host_mapping(uint64_t addr, unsigned long frame,
+                              unsigned int flags, unsigned int cache_flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
+
+    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
+}
+
 static int replace_grant_p2m_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 05/13] x86/mm: carve out replace_grant_pv_mapping
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (3 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 06/13] x86/mm: extract page table masks to mm.h Wei Liu
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

We will later split out PV specific code to pv/mm.c.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b44d44c782..c4924521b0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4304,23 +4304,20 @@ static int replace_grant_p2m_mapping(
     return GNTST_okay;
 }
 
-int replace_grant_host_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
+static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                                    uint64_t new_addr, unsigned int flags)
 {
     struct vcpu *curr = current;
     l1_pgentry_t *pl1e, ol1e;
     unsigned long gl1mfn;
     struct page_info *l1pg;
     int rc;
-    
-    if ( paging_mode_external(current->domain) )
-        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
 
     if ( flags & GNTMAP_contains_pte )
     {
         if ( !new_addr )
             return destroy_grant_pte_mapping(addr, frame, curr->domain);
-        
+
         MEM_LOG("Unsupported grant table operation");
         return GNTST_general_error;
     }
@@ -4381,6 +4378,15 @@ int replace_grant_host_mapping(
     return rc;
 }
 
+int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
+                               uint64_t new_addr, unsigned int flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
+
+    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
+}
+
 int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 06/13] x86/mm: extract page table masks to mm.h
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (4 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 05/13] x86/mm: carve out replace_grant_pv_mapping Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 07/13] x86/mm: extract PAGE_CACHE_ATTRS " Wei Liu
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

The masks are going to be needed by common page table management code
and PV page table management code.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 21 +--------------------
 xen/include/asm-x86/mm.h | 21 +++++++++++++++++++++
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c4924521b0..93eb848e72 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -155,26 +155,7 @@ boolean_param("allowsuperpage", opt_allow_superpage);
 
 static void put_superpage(unsigned long mfn);
 
-static uint32_t base_disallow_mask;
-/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
-#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
-
-#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
-                          ? base_disallow_mask & ~_PAGE_PSE \
-                          : base_disallow_mask)
-
-#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
-                             base_disallow_mask : 0xFFFFF198U)
-
-#define L4_DISALLOW_MASK (base_disallow_mask)
-
-#define l1_disallow_mask(d)                                     \
-    ((d != dom_io) &&                                           \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+uint32_t base_disallow_mask;
 
 static s8 __read_mostly opt_mmio_relax;
 static void __init parse_mmio_relax(const char *s)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2bcf5514e9..8d5e4ad6d9 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -536,6 +536,27 @@ void audit_domains(void);
 
 #endif
 
+extern uint32_t base_disallow_mask;
+/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
+#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+
+#define L2_DISALLOW_MASK (unlikely(opt_allow_superpage) \
+                          ? base_disallow_mask & ~_PAGE_PSE \
+                          : base_disallow_mask)
+
+#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
+                             base_disallow_mask : 0xFFFFF198U)
+
+#define L4_DISALLOW_MASK (base_disallow_mask)
+
+#define l1_disallow_mask(d)                                     \
+    ((d != dom_io) &&                                           \
+     (rangeset_is_empty((d)->iomem_caps) &&                     \
+      rangeset_is_empty((d)->arch.ioport_caps) &&               \
+      !has_arch_pdevs(d) &&                                     \
+      is_pv_domain(d)) ?                                        \
+     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+
 int new_guest_cr3(unsigned long pfn);
 void make_cr3(struct vcpu *v, unsigned long mfn);
 void update_cr3(struct vcpu *v);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 07/13] x86/mm: extract PAGE_CACHE_ATTRS to mm.h
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (5 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 06/13] x86/mm: extract page table masks to mm.h Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 08/13] x86/mm: extract adjust_guest_l*e macros " Wei Liu
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 --
 xen/include/asm-x86/mm.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 93eb848e72..888b4532f4 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -148,8 +148,6 @@ bool_t __read_mostly machine_to_phys_mapping_valid = 0;
 
 struct rangeset *__read_mostly mmio_ro_ranges;
 
-#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
-
 bool_t __read_mostly opt_allow_superpage;
 boolean_param("allowsuperpage", opt_allow_superpage);
 
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 8d5e4ad6d9..25f3bafd65 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -633,4 +633,6 @@ int mm_vcpumask_to_pcpumask(struct domain *d,
 
 #define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
 
+#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 08/13] x86/mm: extract adjust_guest_l*e macros to mm.h
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (6 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 07/13] x86/mm: extract PAGE_CACHE_ATTRS " Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 09/13] x86/mm: export a bunch of {get, put}_page functions Wei Liu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 46 ----------------------------------------------
 xen/include/asm-x86/mm.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 888b4532f4..3e4ad22513 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1148,52 +1148,6 @@ get_page_from_l4e(
     return rc;
 }
 
-#define adjust_guest_l1e(pl1e, d)                                            \
-    do {                                                                     \
-        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
-             likely(!is_pv_32bit_domain(d)) )                                \
-        {                                                                    \
-            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
-            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
-                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
-                MEM_LOG("Global bit is set to kernel page %lx",              \
-                        l1e_get_pfn((pl1e)));                                \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
-                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
-                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
-        }                                                                    \
-    } while ( 0 )
-
-#define adjust_guest_l2e(pl2e, d)                               \
-    do {                                                        \
-        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l2e_add_flags((pl2e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define adjust_guest_l3e(pl3e, d)                                   \
-    do {                                                            \
-        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
-            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
-                                         _PAGE_USER :               \
-                                         _PAGE_USER|_PAGE_RW);      \
-    } while ( 0 )
-
-#define adjust_guest_l4e(pl4e, d)                               \
-    do {                                                        \
-        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l4e_add_flags((pl4e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define unadjust_guest_l3e(pl3e, d)                                         \
-    do {                                                                    \
-        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
-             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
-            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
-    } while ( 0 )
-
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 {
     unsigned long     pfn = l1e_get_pfn(l1e);
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 25f3bafd65..4a5730e412 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -635,4 +635,51 @@ int mm_vcpumask_to_pcpumask(struct domain *d,
 
 #define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
 
+#define adjust_guest_l1e(pl1e, d)                                            \
+    do {                                                                     \
+        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
+             likely(!is_pv_32bit_domain(d)) )                                \
+        {                                                                    \
+            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
+            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
+                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
+                MEM_LOG("Global bit is set to kernel page %lx",              \
+                        l1e_get_pfn((pl1e)));                                \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
+                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
+                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
+        }                                                                    \
+    } while ( 0 )
+
+#define adjust_guest_l2e(pl2e, d)                               \
+    do {                                                        \
+        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l2e_add_flags((pl2e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define adjust_guest_l3e(pl3e, d)                                   \
+    do {                                                            \
+        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
+            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
+                                         _PAGE_USER :               \
+                                         _PAGE_USER|_PAGE_RW);      \
+    } while ( 0 )
+
+#define adjust_guest_l4e(pl4e, d)                               \
+    do {                                                        \
+        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l4e_add_flags((pl4e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define unadjust_guest_l3e(pl3e, d)                                         \
+    do {                                                                    \
+        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
+             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
+            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
+    } while ( 0 )
+
+
 #endif /* __ASM_X86_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 09/13] x86/mm: export a bunch of {get, put}_page functions
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (7 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 08/13] x86/mm: extract adjust_guest_l*e macros " Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 10/13] x86/mm: export invalidate_shadow_ldt Wei Liu
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

They will be needed by common code and PV specific code later.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 41 ++++++++++++++++++-----------------------
 xen/include/asm-x86/mm.h | 16 ++++++++++++++++
 2 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 3e4ad22513..13c8dd8ac0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -678,7 +678,7 @@ int map_ldt_shadow_page(unsigned int off)
 }
 
 
-static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
 
@@ -692,11 +692,11 @@ static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 }
 
 
-static int get_page_and_type_from_pagenr(unsigned long page_nr, 
-                                         unsigned long type,
-                                         struct domain *d,
-                                         int partial,
-                                         int preemptible)
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible)
 {
     struct page_info *page = mfn_to_page(page_nr);
     int rc;
@@ -853,9 +853,8 @@ static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
 }
 #endif
 
-int
-get_page_from_l1e(
-    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
+int get_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner,
+                      struct domain *pg_owner)
 {
     unsigned long mfn = l1e_get_pfn(l1e);
     struct page_info *page = mfn_to_page(mfn);
@@ -1057,9 +1056,7 @@ get_page_from_l1e(
 
 /* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
 define_get_linear_pagetable(l2);
-static int
-get_page_from_l2e(
-    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
 {
     unsigned long mfn = l2e_get_pfn(l2e);
     int rc;
@@ -1099,9 +1096,8 @@ get_page_from_l2e(
 
 
 define_get_linear_pagetable(l3);
-static int
-get_page_from_l3e(
-    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial)
 {
     int rc;
 
@@ -1125,9 +1121,8 @@ get_page_from_l3e(
 }
 
 define_get_linear_pagetable(l4);
-static int
-get_page_from_l4e(
-    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial)
 {
     int rc;
 
@@ -1209,7 +1204,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
  * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
  * Note also that this automatically deals correctly with linear p.t.'s.
  */
-static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
 {
     if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
         return 1;
@@ -1224,8 +1219,8 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
 
 static int __put_page_type(struct page_info *, int preemptible);
 
-static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
-                             int partial, bool_t defer)
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                      int partial, bool_t defer)
 {
     struct page_info *pg;
 
@@ -1262,8 +1257,8 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
     return put_page_and_type_preemptible(pg);
 }
 
-static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
-                             int partial, bool_t defer)
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                      int partial, bool_t defer)
 {
     if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && 
          (l4e_get_pfn(l4e) != pfn) )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 4a5730e412..5e808c5c66 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -364,6 +364,22 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+int get_page_from_l2e( l2_pgentry_t l2e, unsigned long pfn, struct domain *d);
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                      int partial, bool_t defer);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                      int partial, bool_t defer);
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d);
+int get_page_and_type_from_pagenr(unsigned long page_nr,
+                                  unsigned long type,
+                                  struct domain *d,
+                                  int partial,
+                                  int preemptible);
 
 static inline void put_page_and_type(struct page_info *page)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 10/13] x86/mm: export invalidate_shadow_ldt
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (8 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 09/13] x86/mm: export a bunch of {get, put}_page functions Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 11/13] x86/mm: export create_pae_xen_mappings Wei Liu
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 +-
 xen/include/asm-x86/mm.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 13c8dd8ac0..0f49671191 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -584,7 +584,7 @@ static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
-static void invalidate_shadow_ldt(struct vcpu *v, int flush)
+void invalidate_shadow_ldt(struct vcpu *v, int flush)
 {
     l1_pgentry_t *pl1e;
     unsigned int i;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 5e808c5c66..2567b1b83b 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -593,6 +593,7 @@ int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags);
 
 int map_ldt_shadow_page(unsigned int);
+void invalidate_shadow_ldt(struct vcpu *v, int flush);
 
 #define NIL(type) ((type *)-sizeof(type))
 #define IS_NIL(ptr) (!((uintptr_t)(ptr) + sizeof(*(ptr))))
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 11/13] x86/mm: export create_pae_xen_mappings
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (9 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 10/13] x86/mm: export invalidate_shadow_ldt Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 13/13] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 2 +-
 xen/include/asm-x86/mm.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 0f49671191..92e79d7fb6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1323,7 +1323,7 @@ static int alloc_l1_table(struct page_info *page)
     return ret;
 }
 
-static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
 {
     struct page_info *page;
     l3_pgentry_t     l3e3;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2567b1b83b..967b7fcda9 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -604,6 +604,7 @@ int create_perdomain_mapping(struct domain *, unsigned long va,
 void destroy_perdomain_mapping(struct domain *, unsigned long va,
                                unsigned int nr);
 void free_perdomain_mappings(struct domain *);
+int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e);
 
 extern int memory_add(unsigned long spfn, unsigned long epfn, unsigned int pxm);
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (10 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 11/13] x86/mm: export create_pae_xen_mappings Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  2017-03-27  9:10 ` [PATCH RFC 13/13] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

Move the following PV specific code to that new file:

1. Three hypercalls that are only available to PV guests:
   1. do_mmu_update
   2. do_update_va_mapping
   3. do_update_va_mapping_otherdomain
2. PV MMIO emulation code
3. PV writable page table emulation code
4. PV grant table creation / destruction code
5. Other supporting code for the above

Move everything in one patch because they share a lot of code.

Also move the PV page table API comment and delete trailing white
spaces.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 1918 +---------------------------------------------
 xen/arch/x86/pv/Makefile |    1 +
 xen/arch/x86/pv/mm.c     | 1902 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h |    5 +
 4 files changed, 1935 insertions(+), 1891 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 92e79d7fb6..0119cacc43 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -18,71 +18,6 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * A description of the x86 page table API:
- * 
- * Domains trap to do_mmu_update with a list of update requests.
- * This is a list of (ptr, val) pairs, where the requested operation
- * is *ptr = val.
- * 
- * Reference counting of pages:
- * ----------------------------
- * Each page has two refcounts: tot_count and type_count.
- * 
- * TOT_COUNT is the obvious reference count. It counts all uses of a
- * physical page frame by a domain, including uses as a page directory,
- * a page table, or simple mappings via a PTE. This count prevents a
- * domain from releasing a frame back to the free pool when it still holds
- * a reference to it.
- * 
- * TYPE_COUNT is more subtle. A frame can be put to one of three
- * mutually-exclusive uses: it might be used as a page directory, or a
- * page table, or it may be mapped writable by the domain [of course, a
- * frame may not be used in any of these three ways!].
- * So, type_count is a count of the number of times a frame is being 
- * referred to in its current incarnation. Therefore, a page can only
- * change its type when its type count is zero.
- * 
- * Pinning the page type:
- * ----------------------
- * The type of a page can be pinned/unpinned with the commands
- * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
- * pinning is not reference counted, so it can't be nested).
- * This is useful to prevent a page's type count falling to zero, at which
- * point safety checks would need to be carried out next time the count
- * is increased again.
- * 
- * A further note on writable page mappings:
- * -----------------------------------------
- * For simplicity, the count of writable mappings for a page may not
- * correspond to reality. The 'writable count' is incremented for every
- * PTE which maps the page with the _PAGE_RW flag set. However, for
- * write access to be possible the page directory entry must also have
- * its _PAGE_RW bit set. We do not check this as it complicates the 
- * reference counting considerably [consider the case of multiple
- * directory entries referencing a single page table, some with the RW
- * bit set, others not -- it starts getting a bit messy].
- * In normal use, this simplification shouldn't be a problem.
- * However, the logic can be added if required.
- * 
- * One more note on read-only page mappings:
- * -----------------------------------------
- * We want domains to be able to map pages for read-only access. The
- * main reason is that page tables and directories should be readable
- * by a domain, but it would not be safe for them to be writable.
- * However, domains have free access to rings 1 & 2 of the Intel
- * privilege model. In terms of page protection, these are considered
- * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
- * read-only restrictions are respected in supervisor mode -- if the 
- * bit is clear then any mapped page is writable.
- * 
- * We get round this by always setting the WP bit and disallowing 
- * updates to it. This is very unlikely to cause a problem for guest
- * OS's, which will generally use the WP bit to simplify copy-on-write
- * implementation (in that case, OS wants a fault when it writes to
- * an application-supplied buffer).
- */
-
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
@@ -127,14 +62,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-/*
- * PTE updates can be done with ordinary writes except:
- *  1. Debug builds get extra checking by using CMPXCHG[8B].
- */
-#if !defined(NDEBUG)
-#define PTE_UPDATE_WITH_CMPXCHG
-#endif
-
 paddr_t __read_mostly mem_hotplug;
 
 /* Private domain structs for DOMID_XEN and DOMID_IO. */
@@ -520,67 +447,6 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/* Get a mapping of a PV guest's l1e for this virtual address. */
-static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
-{
-    l2_pgentry_t l2e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) )
-        return NULL;
-
-    /* Find this l1e and its enclosing l1mfn in the linear map. */
-    if ( __copy_from_user(&l2e,
-                          &__linear_l2_table[l2_linear_offset(addr)],
-                          sizeof(l2_pgentry_t)) )
-        return NULL;
-
-    /* Check flags that it will be safe to read the l1e. */
-    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
-        return NULL;
-
-    *gl1mfn = l2e_get_pfn(l2e);
-
-    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
-           l1_table_offset(addr);
-}
-
-/* Pull down the mapping we got from guest_map_l1e(). */
-static inline void guest_unmap_l1e(void *p)
-{
-    unmap_domain_page(p);
-}
-
-/* Read a PV guest's l1e that maps this virtual address. */
-static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
-{
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) ||
-         __copy_from_user(eff_l1e,
-                          &__linear_l1_table[l1_linear_offset(addr)],
-                          sizeof(l1_pgentry_t)) )
-        *eff_l1e = l1e_empty();
-}
-
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
-                                          void *eff_l1e)
-{
-    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
-    TOGGLE_MODE();
-    guest_get_eff_l1e(addr, eff_l1e);
-    TOGGLE_MODE();
-}
-
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
@@ -635,49 +501,6 @@ static int alloc_segdesc_page(struct page_info *page)
     return i == 512 ? 0 : -EINVAL;
 }
 
-
-/* Map shadow page at offset @off. */
-int map_ldt_shadow_page(unsigned int off)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    unsigned long gmfn;
-    struct page_info *page;
-    l1_pgentry_t l1e, nl1e;
-    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
-    int okay;
-
-    BUG_ON(unlikely(in_irq()));
-
-    if ( is_pv_32bit_domain(d) )
-        gva = (u32)gva;
-    guest_get_eff_kern_l1e(v, gva, &l1e);
-    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
-        return 0;
-
-    gmfn = l1e_get_pfn(l1e);
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-    if ( unlikely(!page) )
-        return 0;
-
-    okay = get_page_type(page, PGT_seg_desc_page);
-    if ( unlikely(!okay) )
-    {
-        put_page(page);
-        return 0;
-    }
-
-    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
-    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    return 1;
-}
-
-
 int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
@@ -1744,344 +1567,6 @@ void page_unlock(struct page_info *page)
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
 }
 
-/* How to write an entry to the guest pagetables.
- * Returns 0 for failure (pointer not valid), 1 for success. */
-static inline int update_intpte(intpte_t *p, 
-                                intpte_t old, 
-                                intpte_t new,
-                                unsigned long mfn,
-                                struct vcpu *v,
-                                int preserve_ad)
-{
-    int rv = 1;
-#ifndef PTE_UPDATE_WITH_CMPXCHG
-    if ( !preserve_ad )
-    {
-        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-    }
-    else
-#endif
-    {
-        intpte_t t = old;
-        for ( ; ; )
-        {
-            intpte_t _new = new;
-            if ( preserve_ad )
-                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
-            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
-            if ( unlikely(rv == 0) )
-            {
-                MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte
-                        ": saw %" PRIpte, old, _new, t);
-                break;
-            }
-
-            if ( t == old )
-                break;
-
-            /* Allowed to change in Accessed/Dirty flags only. */
-            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
-            old = t;
-        }
-    }
-    return rv;
-}
-
-/* Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
-    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
-                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v), (_ad))
-
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST                                     \
-    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
-     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-                        unsigned long gl1mfn, int preserve_ad,
-                        struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
-    l1_pgentry_t ol1e;
-    struct domain *pt_dom = pt_vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
-        return -EFAULT;
-
-    if ( unlikely(paging_mode_refcounts(pt_dom)) )
-    {
-        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
-            return 0;
-        return -EBUSY;
-    }
-
-    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
-    {
-        /* Translate foreign guest addresses. */
-        struct page_info *page = NULL;
-
-        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
-        {
-            MEM_LOG("Bad L1 flags %x",
-                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
-            return -EINVAL;
-        }
-
-        if ( paging_mode_translate(pg_dom) )
-        {
-            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
-            if ( !page )
-                return -EINVAL;
-            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l1e(nl1e, pt_dom);
-            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                              preserve_ad);
-            if ( page )
-                put_page(page);
-            return rc ? 0 : -EBUSY;
-        }
-
-        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
-        {
-        default:
-            if ( page )
-                put_page(page);
-            return rc;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(nl1e, rc);
-            rc = 0;
-            break;
-        }
-        if ( page )
-            put_page(page);
-
-        adjust_guest_l1e(nl1e, pt_dom);
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                    preserve_ad)) )
-        {
-            ol1e = nl1e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l1e(ol1e, pt_dom);
-    return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e, 
-                        l2_pgentry_t nl2e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l2_pgentry_t ol2e;
-    struct domain *d = vcpu->domain;
-    struct page_info *l2pg = mfn_to_page(pfn);
-    unsigned long type = l2pg->u.inuse.type_info;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
-    {
-        MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
-        return -EPERM;
-    }
-
-    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
-        return -EFAULT;
-
-    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
-        {
-            MEM_LOG("Bad L2 flags %x",
-                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l2e(nl2e, d);
-            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
-                return 0;
-            return -EBUSY;
-        }
-
-        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
-            return rc;
-
-        adjust_guest_l2e(nl2e, d);
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol2e = nl2e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l2e(ol2e, pfn);
-    return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e, 
-                        l3_pgentry_t nl3e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l3_pgentry_t ol3e;
-    struct domain *d = vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
-    {
-        MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e);
-        return -EINVAL;
-    }
-
-    /*
-     * Disallow updates to final L3 slot. It contains Xen mappings, and it
-     * would be a pain to ensure they remain continuously valid throughout.
-     */
-    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
-        return -EINVAL;
-
-    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
-        return -EFAULT;
-
-    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
-        {
-            MEM_LOG("Bad L3 flags %x",
-                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l3e(nl3e, d);
-            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l3e(nl3e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l3e(nl3e, d);
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol3e = nl3e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    if ( likely(rc == 0) )
-        if ( !create_pae_xen_mappings(d, pl3e) )
-            BUG();
-
-    put_page_from_l3e(ol3e, pfn, 0, 1);
-    return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e, 
-                        l4_pgentry_t nl4e, 
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    struct domain *d = vcpu->domain;
-    l4_pgentry_t ol4e;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
-    {
-        MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e);
-        return -EINVAL;
-    }
-
-    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
-        return -EFAULT;
-
-    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
-        {
-            MEM_LOG("Bad L4 flags %x",
-                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l4e(nl4e, d);
-            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l4e(nl4e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l4e(nl4e, d);
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol4e = nl4e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    put_page_from_l4e(ol4e, pfn, 0, 1);
-    return rc;
-}
-
 static int cleanup_page_cacheattr(struct page_info *page)
 {
     unsigned int cacheattr =
@@ -2849,125 +2334,23 @@ int vcpu_destroy_pagetables(struct vcpu *v)
     return rc != -EINTR ? rc : -ERESTART;
 }
 
-int new_guest_cr3(unsigned long mfn)
+struct domain *mm_get_pg_owner(domid_t domid)
 {
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    int rc;
-    unsigned long old_base_mfn;
+    struct domain *pg_owner = NULL, *curr = current->domain;
 
-    if ( is_pv_32bit_domain(d) )
+    if ( likely(domid == DOMID_SELF) )
     {
-        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
-        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
-
-        rc = paging_mode_refcounts(d)
-             ? -EINVAL /* Old code was broken, but what should it be? */
-             : mod_l4_entry(
-                    pl4e,
-                    l4e_from_pfn(
-                        mfn,
-                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
-                    gt_mfn, 0, curr);
-        unmap_domain_page(pl4e);
-        switch ( rc )
-        {
-        case 0:
-            break;
-        case -EINTR:
-        case -ERESTART:
-            return -ERESTART;
-        default:
-            MEM_LOG("Error while installing new compat baseptr %lx", mfn);
-            return rc;
-        }
-
-        invalidate_shadow_ldt(curr, 0);
-        write_ptbase(curr);
-
-        return 0;
+        pg_owner = rcu_lock_current_domain();
+        goto out;
     }
 
-    rc = put_old_guest_table(curr);
-    if ( unlikely(rc) )
-        return rc;
-
-    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
-    /*
-     * This is particularly important when getting restarted after the
-     * previous attempt got preempted in the put-old-MFN phase.
-     */
-    if ( old_base_mfn == mfn )
+    if ( unlikely(domid == curr->domain_id) )
     {
-        write_ptbase(curr);
-        return 0;
+        MEM_LOG("Cannot specify itself as foreign domain");
+        goto out;
     }
 
-    rc = paging_mode_refcounts(d)
-         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
-         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
-    switch ( rc )
-    {
-    case 0:
-        break;
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    default:
-        MEM_LOG("Error while installing new baseptr %lx", mfn);
-        return rc;
-    }
-
-    invalidate_shadow_ldt(curr, 0);
-
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
-        fill_ro_mpt(mfn);
-    curr->arch.guest_table = pagetable_from_pfn(mfn);
-    update_cr3(curr);
-
-    write_ptbase(curr);
-
-    if ( likely(old_base_mfn != 0) )
-    {
-        struct page_info *page = mfn_to_page(old_base_mfn);
-
-        if ( paging_mode_refcounts(d) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-                rc = -ERESTART;
-                /* fallthrough */
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-    }
-
-    return rc;
-}
-
-struct domain *mm_get_pg_owner(domid_t domid)
-{
-    struct domain *pg_owner = NULL, *curr = current->domain;
-
-    if ( likely(domid == DOMID_SELF) )
-    {
-        pg_owner = rcu_lock_current_domain();
-        goto out;
-    }
-
-    if ( unlikely(domid == curr->domain_id) )
-    {
-        MEM_LOG("Cannot specify itself as foreign domain");
-        goto out;
-    }
-
-    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
+    if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
     {
         MEM_LOG("Cannot mix foreign mappings with translated domains");
         goto out;
@@ -3581,572 +2964,6 @@ long do_mmuext_op(
     return rc;
 }
 
-long do_mmu_update(
-    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmu_update req;
-    void *va;
-    unsigned long gpfn, gmfn, mfn;
-    struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
-    struct vcpu *curr = current, *v = curr;
-    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
-    struct domain_mmap_cache mapcache;
-    uint32_t xsm_needed = 0;
-    uint32_t xsm_checked = 0;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(ureqs)) )
-    {
-        /* See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below. */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmu_update);
-
-    if ( unlikely(!guest_handle_okay(ureqs, count)) )
-        return -EFAULT;
-
-    if ( (pt_dom = foreigndom >> 16) != 0 )
-    {
-        /* Pagetables belong to a foreign domain (PFD). */
-        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -ESRCH;
-
-        if ( pt_owner == d )
-            rcu_unlock_domain(pt_owner);
-        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
-        {
-            rc = -EINVAL;
-            goto out;
-        }
-    }
-
-    if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL )
-    {
-        rc = -ESRCH;
-        goto out;
-    }
-
-    domain_mmap_cache_init(&mapcache);
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
-        {
-            MEM_LOG("Bad __copy_from_guest");
-            rc = -EFAULT;
-            break;
-        }
-
-        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
-        switch ( cmd )
-        {
-            /*
-             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
-             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
-             * current A/D bits.
-             */
-        case MMU_NORMAL_PT_UPDATE:
-        case MMU_PT_UPDATE_PRESERVE_AD:
-        {
-            p2m_type_t p2mt;
-
-            rc = -EOPNOTSUPP;
-            if ( unlikely(paging_mode_refcounts(pt_owner)) )
-                break;
-
-            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
-            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
-            {
-                xsm_needed |= XSM_MMU_UPDATE_READ;
-                if ( get_pte_flags(req.val) & _PAGE_RW )
-                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
-            }
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-            rc = -EINVAL;
-
-            req.ptr -= cmd;
-            gmfn = req.ptr >> PAGE_SHIFT;
-            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
-            if ( p2m_is_paged(p2mt) )
-            {
-                ASSERT(!page);
-                p2m_mem_paging_populate(pg_owner, gmfn);
-                rc = -ENOENT;
-                break;
-            }
-
-            if ( unlikely(!page) )
-            {
-                MEM_LOG("Could not get page for normal update");
-                break;
-            }
-
-            mfn = page_to_mfn(page);
-            va = map_domain_page_with_cache(mfn, &mapcache);
-            va = (void *)((unsigned long)va +
-                          (unsigned long)(req.ptr & ~PAGE_MASK));
-
-            if ( page_lock(page) )
-            {
-                switch ( page->u.inuse.type_info & PGT_type_mask )
-                {
-                case PGT_l1_page_table:
-                {
-                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    p2m_type_t l1e_p2mt = p2m_ram_rw;
-                    struct page_info *target = NULL;
-                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
-                                        P2M_UNSHARE : P2M_ALLOC;
-
-                    if ( paging_mode_translate(pg_owner) )
-                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
-                                                   &l1e_p2mt, q);
-
-                    if ( p2m_is_paged(l1e_p2mt) )
-                    {
-                        if ( target )
-                            put_page(target);
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
-                    {
-                        rc = -ENOENT;
-                        break;
-                    }
-                    /* If we tried to unshare and failed */
-                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
-                    {
-                        /* We could not have obtained a page ref. */
-                        ASSERT(target == NULL);
-                        /* And mem_sharing_notify has already been called. */
-                        rc = -ENOMEM;
-                        break;
-                    }
-
-                    rc = mod_l1_entry(va, l1e, mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-                                      pg_owner);
-                    if ( target )
-                        put_page(target);
-                }
-                break;
-                case PGT_l2_page_table:
-                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l3_page_table:
-                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l4_page_table:
-                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                break;
-                case PGT_writable_page:
-                    perfc_incr(writable_mmu_updates);
-                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                        rc = 0;
-                    break;
-                }
-                page_unlock(page);
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-            }
-            else if ( get_page_type(page, PGT_writable_page) )
-            {
-                perfc_incr(writable_mmu_updates);
-                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                    rc = 0;
-                put_page_type(page);
-            }
-
-            unmap_domain_page_with_cache(va, &mapcache);
-            put_page(page);
-        }
-        break;
-
-        case MMU_MACHPHYS_UPDATE:
-            if ( unlikely(d != pt_owner) )
-            {
-                rc = -EPERM;
-                break;
-            }
-
-            if ( unlikely(paging_mode_translate(pg_owner)) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            mfn = req.ptr >> PAGE_SHIFT;
-            gpfn = req.val;
-
-            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-
-            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
-            {
-                MEM_LOG("Could not get page for mach->phys update");
-                rc = -EINVAL;
-                break;
-            }
-
-            set_gpfn_from_mfn(mfn, gpfn);
-
-            paging_mark_dirty(pg_owner, _mfn(mfn));
-
-            put_page(mfn_to_page(mfn));
-            break;
-
-        default:
-            MEM_LOG("Invalid page update command %x", cmd);
-            rc = -ENOSYS;
-            break;
-        }
-
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(ureqs, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmu_update, "hihi",
-            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmu_update, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    mm_put_pg_owner(pg_owner);
-
-    domain_mmap_cache_destroy(&mapcache);
-
-    perfc_add(num_page_updates, i);
-
- out:
-    if ( pt_owner != d )
-        rcu_unlock_domain(pt_owner);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-
-static int create_grant_pte_mapping(
-    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-    struct domain *d = v->domain;
-
-    adjust_guest_l1e(nl1e, d);
-
-    gmfn = pte_addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        MEM_LOG("Could not get page for normal update");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
-
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    } 
-
-    page_unlock(page);
-
-    if ( !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
-
- failed:
-    unmap_domain_page(va);
-    put_page(page);
-
-    return rc;
-}
-
-static int destroy_grant_pte_mapping(
-    uint64_t addr, unsigned long frame, struct domain *d)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-
-    gmfn = addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        MEM_LOG("Could not get page for normal update");
-        return GNTST_general_error;
-    }
-    
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
-
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    ol1e = *(l1_pgentry_t *)va;
-    
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
-    {
-        page_unlock(page);
-        MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx",
-                (unsigned long)l1e_get_intpte(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY
-                  (l1, 
-                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn, 
-                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
-                   0)) )
-    {
-        page_unlock(page);
-        MEM_LOG("Cannot delete PTE entry at %p", va);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    page_unlock(page);
-
- failed:
-    unmap_domain_page(va);
-    put_page(page);
-    return rc;
-}
-
-
-static int create_grant_va_mapping(
-    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    l1_pgentry_t *pl1e, ol1e;
-    struct domain *d = v->domain;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int okay;
-    
-    adjust_guest_l1e(nl1e, d);
-
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( !pl1e )
-    {
-        MEM_LOG("Could not find L1 PTE for address %lx", va);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
-
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
-
-    if ( okay && !paging_mode_refcounts(d) )
-        put_page_from_l1e(ol1e, d);
-
-    return okay ? GNTST_okay : GNTST_general_error;
-}
-
-static int replace_grant_va_mapping(
-    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc = 0;
-    
-    pl1e = guest_map_l1e(addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        MEM_LOG("Could not find L1 PTE for address %lx", addr);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        rc = GNTST_general_error;
-        goto out;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        rc = GNTST_general_error;
-        put_page(l1pg);
-        goto out;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
-    ol1e = *pl1e;
-
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
-    {
-        MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx",
-                l1e_get_pfn(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
-    {
-        MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
- unlock_and_out:
-    page_unlock(l1pg);
-    put_page(l1pg);
- out:
-    guest_unmap_l1e(pl1e);
-    return rc;
-}
-
-static int destroy_grant_va_mapping(
-    unsigned long addr, unsigned long frame, struct vcpu *v)
-{
-    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
-}
-
 static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
                                     unsigned int flags,
                                     unsigned int cache_flags)
@@ -4170,140 +2987,38 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
         return GNTST_okay;
 }
 
-static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                   unsigned int flags, unsigned int cache_flags)
-{
-    l1_pgentry_t pte;
-    uint32_t grant_pte_flags;
-
-    grant_pte_flags =
-        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
-    if ( cpu_has_nx )
-        grant_pte_flags |= _PAGE_NX_BIT;
-
-    pte = l1e_from_pfn(frame, grant_pte_flags);
-    if ( (flags & GNTMAP_application_map) )
-        l1e_add_flags(pte,_PAGE_USER);
-    if ( !(flags & GNTMAP_readonly) )
-        l1e_add_flags(pte,_PAGE_RW);
-
-    l1e_add_flags(pte,
-                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
-                   & _PAGE_AVAIL);
-
-    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
-
-    if ( flags & GNTMAP_contains_pte )
-        return create_grant_pte_mapping(addr, pte, current);
-    return create_grant_va_mapping(addr, pte, current);
-}
-
 int create_grant_host_mapping(uint64_t addr, unsigned long frame,
                               unsigned int flags, unsigned int cache_flags)
 {
     if ( paging_mode_external(current->domain) )
         return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
 
-    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
-}
-
-static int replace_grant_p2m_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
-{
-    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
-    p2m_type_t type;
-    mfn_t old_mfn;
-    struct domain *d = current->domain;
-
-    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
-        return GNTST_general_error;
-
-    old_mfn = get_gfn(d, gfn, &type);
-    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
-    {
-        put_gfn(d, gfn);
-        MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
-                type, mfn_x(old_mfn), frame);
-        return GNTST_general_error;
-    }
-    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
-
-    put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
-static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                                    uint64_t new_addr, unsigned int flags)
-{
-    struct vcpu *curr = current;
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc;
-
-    if ( flags & GNTMAP_contains_pte )
-    {
-        if ( !new_addr )
-            return destroy_grant_pte_mapping(addr, frame, curr->domain);
-
-        MEM_LOG("Unsupported grant table operation");
-        return GNTST_general_error;
-    }
-
-    if ( !new_addr )
-        return destroy_grant_va_mapping(addr, frame, curr);
-
-    pl1e = guest_map_l1e(new_addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        MEM_LOG("Could not find L1 PTE for address %lx",
-                (unsigned long)new_addr);
-        return GNTST_general_error;
-    }
-
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
+    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
+}
 
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
-        return GNTST_general_error;
-    }
+static int replace_grant_p2m_mapping(
+    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
+{
+    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
+    p2m_type_t type;
+    mfn_t old_mfn;
+    struct domain *d = current->domain;
 
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        guest_unmap_l1e(pl1e);
+    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
         return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
 
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
-                                gl1mfn, curr, 0)) )
+    old_mfn = get_gfn(d, gfn, &type);
+    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
     {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
-        guest_unmap_l1e(pl1e);
+        put_gfn(d, gfn);
+        MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
+                type, mfn_x(old_mfn), frame);
         return GNTST_general_error;
     }
+    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
 
-    page_unlock(l1pg);
-    put_page(l1pg);
-    guest_unmap_l1e(pl1e);
-
-    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
-    if ( rc && !paging_mode_refcounts(curr->domain) )
-        put_page_from_l1e(ol1e, curr->domain);
-
-    return rc;
+    put_gfn(d, gfn);
+    return GNTST_okay;
 }
 
 int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
@@ -4405,125 +3120,6 @@ int steal_page(
     return -1;
 }
 
-static int __do_update_va_mapping(
-    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
-    l1_pgentry_t   val = l1e_from_intpte(val64);
-    struct vcpu   *v   = current;
-    struct domain *d   = v->domain;
-    struct page_info *gl1pg;
-    l1_pgentry_t  *pl1e;
-    unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t     *mask = NULL;
-    int            rc;
-
-    perfc_incr(calls_to_update_va);
-
-    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
-    if ( rc )
-        return rc;
-
-    rc = -EINVAL;
-    pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
-        goto out;
-
-    gl1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(gl1pg) )
-    {
-        put_page(gl1pg);
-        goto out;
-    }
-
-    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(gl1pg);
-        put_page(gl1pg);
-        goto out;
-    }
-
-    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
-
-    page_unlock(gl1pg);
-    put_page(gl1pg);
-
- out:
-    if ( pl1e )
-        guest_unmap_l1e(pl1e);
-
-    switch ( flags & UVMF_FLUSHTYPE_MASK )
-    {
-    case UVMF_TLB_FLUSH:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            flush_tlb_local();
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = mm_vcpumask_to_pcpumask(d,
-                                         const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                         mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_mask(mask);
-        break;
-
-    case UVMF_INVLPG:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            paging_invlpg(v, va);
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = mm_vcpumask_to_pcpumask(d,
-                                         const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                         mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_one_mask(mask, va);
-        break;
-    }
-
-    return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
-                          unsigned long flags)
-{
-    return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
-                                      unsigned long flags,
-                                      domid_t domid)
-{
-    struct domain *pg_owner;
-    int rc;
-
-    if ( (pg_owner = mm_get_pg_owner(domid)) == NULL )
-        return -ESRCH;
-
-    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
-    mm_put_pg_owner(pg_owner);
-
-    return rc;
-}
-
-
-
 /*************************
  * Descriptor Tables
  */
@@ -5084,466 +3680,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return 0;
 }
 
-
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
-    {
-        MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
-                ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t)-1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               &ptwr_ctxt->ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes*8)) - 1);
-        val <<= (offset)*8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes*8)) - 1);
-        old <<= (offset)*8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(mfn);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        int okay;
-        intpte_t t = old;
-        ol1e = l1e_from_intpte(old);
-
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)",
-                offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(
-        offset, 0, val, bytes, 0,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
-    {
-        MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)",
-                offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(
-        offset, old, new, bytes, 1,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
-                                              : X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr, 
-                       struct cpu_user_regs *regs)
-{
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .ctxt = {
-            .regs = regs,
-            .vendor = d->arch.cpuid->x86_vendor,
-            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .swint_emulate = x86_swint_emulate_none,
-        },
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
-         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
-        goto bail;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        goto bail;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        goto bail;
-    }
-
-    ptwr_ctxt.cr2 = addr;
-    ptwr_ctxt.pte = pte;
-
-    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to pagetables which are marked
-         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
-         * update has succeeded on a different vcpu).  Anything else is an
-         * emulation bug, or a guest playing with the instruction stream under
-         * Xen's feet.
-         */
-        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ptwr_ctxt.ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ptwr_ctxt.ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
- bail:
-    return 0;
-}
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
-int mmio_ro_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
-
-    /* Only allow naturally-aligned stores at the original %cr2 address. */
-    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
-         offset != mmio_ro_ctxt->cr2 )
-    {
-        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
-                mmio_ro_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmio_ro_emulated_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-int mmcfg_intercept_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
-    /*
-     * Only allow naturally-aligned stores no wider than 4 bytes to the
-     * original %cr2 address.
-     */
-    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
-         offset != mmio_ctxt->cr2 )
-    {
-        MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)",
-                mmio_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    offset &= 0xfff;
-    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
-                                  offset, bytes, p_data) >= 0 )
-        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
-                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
-                        *(uint32_t *)p_data);
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = mmcfg_intercept_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
-                          struct cpu_user_regs *regs)
-{
-    l1_pgentry_t pte;
-    unsigned long mfn;
-    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
-    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
-    struct x86_emulate_ctxt ctxt = {
-        .regs = regs,
-        .vendor = v->domain->arch.cpuid->x86_vendor,
-        .addr_size = addr_size,
-        .sp_size = addr_size,
-        .swint_emulate = x86_swint_emulate_none,
-        .data = &mmio_ro_ctxt
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of MMIO pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
-        return 0;
-
-    mfn = l1e_get_pfn(pte);
-    if ( mfn_valid(_mfn(mfn)) )
-    {
-        struct page_info *page = mfn_to_page(mfn);
-        struct domain *owner = page_get_owner_and_reference(page);
-
-        if ( owner )
-            put_page(page);
-        if ( owner != dom_io )
-            return 0;
-    }
-
-    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        return 0;
-
-    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
-        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
-    else
-        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to MMCFG space or read-only MFNs.
-         * We tolerate #PF (from hitting an adjacent page or a successful
-         * concurrent pagetable update).  Anything else is an emulation bug,
-         * or a guest playing with the instruction stream under Xen's feet.
-         */
-        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ctxt.event.type, ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 void *alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index ea94599438..665be5536c 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,2 +1,3 @@
 obj-y += hypercall.o
 obj-bin-y += dom0_build.init.o
+obj-y += mm.o
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..fd157b9f58
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,1902 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * A description of the x86 page table API:
+ *
+ * Domains trap to do_mmu_update with a list of update requests.
+ * This is a list of (ptr, val) pairs, where the requested operation
+ * is *ptr = val.
+ *
+ * Reference counting of pages:
+ * ----------------------------
+ * Each page has two refcounts: tot_count and type_count.
+ *
+ * TOT_COUNT is the obvious reference count. It counts all uses of a
+ * physical page frame by a domain, including uses as a page directory,
+ * a page table, or simple mappings via a PTE. This count prevents a
+ * domain from releasing a frame back to the free pool when it still holds
+ * a reference to it.
+ *
+ * TYPE_COUNT is more subtle. A frame can be put to one of three
+ * mutually-exclusive uses: it might be used as a page directory, or a
+ * page table, or it may be mapped writable by the domain [of course, a
+ * frame may not be used in any of these three ways!].
+ * So, type_count is a count of the number of times a frame is being
+ * referred to in its current incarnation. Therefore, a page can only
+ * change its type when its type count is zero.
+ *
+ * Pinning the page type:
+ * ----------------------
+ * The type of a page can be pinned/unpinned with the commands
+ * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
+ * pinning is not reference counted, so it can't be nested).
+ * This is useful to prevent a page's type count falling to zero, at which
+ * point safety checks would need to be carried out next time the count
+ * is increased again.
+ *
+ * A further note on writable page mappings:
+ * -----------------------------------------
+ * For simplicity, the count of writable mappings for a page may not
+ * correspond to reality. The 'writable count' is incremented for every
+ * PTE which maps the page with the _PAGE_RW flag set. However, for
+ * write access to be possible the page directory entry must also have
+ * its _PAGE_RW bit set. We do not check this as it complicates the
+ * reference counting considerably [consider the case of multiple
+ * directory entries referencing a single page table, some with the RW
+ * bit set, others not -- it starts getting a bit messy].
+ * In normal use, this simplification shouldn't be a problem.
+ * However, the logic can be added if required.
+ *
+ * One more note on read-only page mappings:
+ * -----------------------------------------
+ * We want domains to be able to map pages for read-only access. The
+ * main reason is that page tables and directories should be readable
+ * by a domain, but it would not be safe for them to be writable.
+ * However, domains have free access to rings 1 & 2 of the Intel
+ * privilege model. In terms of page protection, these are considered
+ * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
+ * read-only restrictions are respected in supervisor mode -- if the
+ * bit is clear then any mapped page is writable.
+ *
+ * We get round this by always setting the WP bit and disallowing
+ * updates to it. This is very unlikely to cause a problem for guest
+ * OS's, which will generally use the WP bit to simplify copy-on-write
+ * implementation (in that case, OS wants a fault when it writes to
+ * an application-supplied buffer).
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/trace.h>
+#include <xsm/xsm.h>
+
+#include <asm/p2m.h>
+#include <asm/paging.h>
+#include <asm/x86_emulate.h>
+
+/*
+ * PTE updates can be done with ordinary writes except:
+ *  1. Debug builds get extra checking by using CMPXCHG[8B].
+ */
+#if !defined(NDEBUG)
+#define PTE_UPDATE_WITH_CMPXCHG
+#endif
+
+/* How to write an entry to the guest pagetables.
+ * Returns 0 for failure (pointer not valid), 1 for success. */
+static inline int update_intpte(intpte_t *p,
+                                intpte_t old,
+                                intpte_t new,
+                                unsigned long mfn,
+                                struct vcpu *v,
+                                int preserve_ad)
+{
+    int rv = 1;
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
+    {
+        intpte_t t = old;
+        for ( ; ; )
+        {
+            intpte_t _new = new;
+            if ( preserve_ad )
+                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+            if ( unlikely(rv == 0) )
+            {
+                MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte
+                        ": saw %" PRIpte, old, _new, t);
+                break;
+            }
+
+            if ( t == old )
+                break;
+
+            /* Allowed to change in Accessed/Dirty flags only. */
+            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+            old = t;
+        }
+    }
+    return rv;
+}
+
+/* Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
+    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
+                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                  (_m), (_v), (_ad))
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST                                     \
+    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+                        unsigned long gl1mfn, int preserve_ad,
+                        struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+    l1_pgentry_t ol1e;
+    struct domain *pt_dom = pt_vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+        return -EFAULT;
+
+    if ( unlikely(paging_mode_refcounts(pt_dom)) )
+    {
+        if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
+            return 0;
+        return -EBUSY;
+    }
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        struct page_info *page = NULL;
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+        {
+            MEM_LOG("Bad L1 flags %x",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+            return -EINVAL;
+        }
+
+        if ( paging_mode_translate(pg_dom) )
+        {
+            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
+            if ( !page )
+                return -EINVAL;
+            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l1e(nl1e, pt_dom);
+            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                              preserve_ad);
+            if ( page )
+                put_page(page);
+            return rc ? 0 : -EBUSY;
+        }
+
+        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        default:
+            if ( page )
+                put_page(page);
+            return rc;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
+            rc = 0;
+            break;
+        }
+        if ( page )
+            put_page(page);
+
+        adjust_guest_l1e(nl1e, pt_dom);
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                    preserve_ad)) )
+        {
+            ol1e = nl1e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l1e(ol1e, pt_dom);
+    return rc;
+}
+
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e,
+                        l2_pgentry_t nl2e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l2_pgentry_t ol2e;
+    struct domain *d = vcpu->domain;
+    struct page_info *l2pg = mfn_to_page(pfn);
+    unsigned long type = l2pg->u.inuse.type_info;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+    {
+        MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
+        return -EPERM;
+    }
+
+    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+        return -EFAULT;
+
+    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+        {
+            MEM_LOG("Bad L2 flags %x",
+                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l2e(nl2e, d);
+            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+                return 0;
+            return -EBUSY;
+        }
+
+        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+            return rc;
+
+        adjust_guest_l2e(nl2e, d);
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol2e = nl2e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l2e(ol2e, pfn);
+    return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e,
+                        l3_pgentry_t nl3e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    l3_pgentry_t ol3e;
+    struct domain *d = vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
+    {
+        MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e);
+        return -EINVAL;
+    }
+
+    /*
+     * Disallow updates to final L3 slot. It contains Xen mappings, and it
+     * would be a pain to ensure they remain continuously valid throughout.
+     */
+    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+        return -EINVAL;
+
+    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+        return -EFAULT;
+
+    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+        {
+            MEM_LOG("Bad L3 flags %x",
+                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l3e(nl3e, d);
+            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l3e(nl3e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l3e(nl3e, d);
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol3e = nl3e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    if ( likely(rc == 0) )
+        if ( !create_pae_xen_mappings(d, pl3e) )
+            BUG();
+
+    put_page_from_l3e(ol3e, pfn, 0, 1);
+    return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e,
+                        l4_pgentry_t nl4e,
+                        unsigned long pfn,
+                        int preserve_ad,
+                        struct vcpu *vcpu)
+{
+    struct domain *d = vcpu->domain;
+    l4_pgentry_t ol4e;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+    {
+        MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e);
+        return -EINVAL;
+    }
+
+    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+        return -EFAULT;
+
+    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+        {
+            MEM_LOG("Bad L4 flags %x",
+                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l4e(nl4e, d);
+            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l4e(nl4e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l4e(nl4e, d);
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol4e = nl4e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    put_page_from_l4e(ol4e, pfn, 0, 1);
+    return rc;
+}
+
+/* Get a mapping of a PV guest's l1e for this virtual address. */
+static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
+{
+    l2_pgentry_t l2e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) )
+        return NULL;
+
+    /* Find this l1e and its enclosing l1mfn in the linear map. */
+    if ( __copy_from_user(&l2e,
+                          &__linear_l2_table[l2_linear_offset(addr)],
+                          sizeof(l2_pgentry_t)) )
+        return NULL;
+
+    /* Check flags that it will be safe to read the l1e. */
+    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+        return NULL;
+
+    *gl1mfn = l2e_get_pfn(l2e);
+
+    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
+           l1_table_offset(addr);
+}
+
+/* Pull down the mapping we got from guest_map_l1e(). */
+static inline void guest_unmap_l1e(void *p)
+{
+    unmap_domain_page(p);
+}
+
+static int create_grant_va_mapping(
+    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    struct domain *d = v->domain;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int okay;
+
+    adjust_guest_l1e(nl1e, d);
+
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( !pl1e )
+    {
+        MEM_LOG("Could not find L1 PTE for address %lx", va);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    if ( okay && !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+    return okay ? GNTST_okay : GNTST_general_error;
+}
+
+static int replace_grant_va_mapping(
+    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc = 0;
+
+    pl1e = guest_map_l1e(addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        MEM_LOG("Could not find L1 PTE for address %lx", addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        rc = GNTST_general_error;
+        goto out;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        rc = GNTST_general_error;
+        put_page(l1pg);
+        goto out;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    ol1e = *pl1e;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx",
+                l1e_get_pfn(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    {
+        MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+ unlock_and_out:
+    page_unlock(l1pg);
+    put_page(l1pg);
+ out:
+    guest_unmap_l1e(pl1e);
+    return rc;
+}
+
+static int create_grant_pte_mapping(
+    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+    struct domain *d = v->domain;
+
+    adjust_guest_l1e(nl1e, d);
+
+    gmfn = pte_addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        MEM_LOG("Could not get page for normal update");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+    if ( !paging_mode_refcounts(d) )
+        put_page_from_l1e(ol1e, d);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+
+    return rc;
+}
+
+static int destroy_grant_pte_mapping(
+    uint64_t addr, unsigned long frame, struct domain *d)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+
+    gmfn = addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        MEM_LOG("Could not get page for normal update");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        page_unlock(page);
+        MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx",
+                (unsigned long)l1e_get_intpte(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY
+                  (l1,
+                   (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
+                   d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                   0)) )
+    {
+        page_unlock(page);
+        MEM_LOG("Cannot delete PTE entry at %p", va);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+    return rc;
+}
+
+/* Read a PV guest's l1e that maps this virtual address. */
+static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
+{
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) ||
+         __copy_from_user(eff_l1e,
+                          &__linear_l1_table[l1_linear_offset(addr)],
+                          sizeof(l1_pgentry_t)) )
+        *eff_l1e = l1e_empty();
+}
+
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                                          void *eff_l1e)
+{
+    bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+    TOGGLE_MODE();
+    guest_get_eff_l1e(addr, eff_l1e);
+    TOGGLE_MODE();
+}
+
+static int destroy_grant_va_mapping(
+    unsigned long addr, unsigned long frame, struct vcpu *v)
+{
+    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+}
+
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			     uint64_t new_addr, unsigned int flags)
+{
+    struct vcpu *curr = current;
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc;
+
+    if ( flags & GNTMAP_contains_pte )
+    {
+        if ( !new_addr )
+            return destroy_grant_pte_mapping(addr, frame, curr->domain);
+
+        MEM_LOG("Unsupported grant table operation");
+        return GNTST_general_error;
+    }
+
+    if ( !new_addr )
+        return destroy_grant_va_mapping(addr, frame, curr);
+
+    pl1e = guest_map_l1e(new_addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        MEM_LOG("Could not find L1 PTE for address %lx",
+                (unsigned long)new_addr);
+        return GNTST_general_error;
+    }
+
+    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
+        guest_unmap_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    guest_unmap_l1e(pl1e);
+
+    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
+    if ( rc && !paging_mode_refcounts(curr->domain) )
+        put_page_from_l1e(ol1e, curr->domain);
+
+    return rc;
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+			    unsigned int flags, unsigned int cache_flags)
+{
+    l1_pgentry_t pte;
+    uint32_t grant_pte_flags;
+
+    grant_pte_flags =
+        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
+    if ( cpu_has_nx )
+        grant_pte_flags |= _PAGE_NX_BIT;
+
+    pte = l1e_from_pfn(frame, grant_pte_flags);
+    if ( (flags & GNTMAP_application_map) )
+        l1e_add_flags(pte,_PAGE_USER);
+    if ( !(flags & GNTMAP_readonly) )
+        l1e_add_flags(pte,_PAGE_RW);
+
+    l1e_add_flags(pte,
+                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
+                   & _PAGE_AVAIL);
+
+    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
+
+    if ( flags & GNTMAP_contains_pte )
+        return create_grant_pte_mapping(addr, pte, current);
+    return create_grant_va_mapping(addr, pte, current);
+}
+
+/* Map shadow page at offset @off. */
+int map_ldt_shadow_page(unsigned int off)
+{
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    unsigned long gmfn;
+    struct page_info *page;
+    l1_pgentry_t l1e, nl1e;
+    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
+    int okay;
+
+    BUG_ON(unlikely(in_irq()));
+
+    if ( is_pv_32bit_domain(d) )
+        gva = (u32)gva;
+    guest_get_eff_kern_l1e(v, gva, &l1e);
+    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
+        return 0;
+
+    gmfn = l1e_get_pfn(l1e);
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+    if ( unlikely(!page) )
+        return 0;
+
+    okay = get_page_type(page, PGT_seg_desc_page);
+    if ( unlikely(!okay) )
+    {
+        put_page(page);
+        return 0;
+    }
+
+    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
+    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    return 1;
+}
+
+int new_guest_cr3(unsigned long mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    int rc;
+    unsigned long old_base_mfn;
+
+    if ( is_pv_32bit_domain(d) )
+    {
+        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
+        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
+
+        rc = paging_mode_refcounts(d)
+             ? -EINVAL /* Old code was broken, but what should it be? */
+             : mod_l4_entry(
+                    pl4e,
+                    l4e_from_pfn(
+                        mfn,
+                        (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
+                    gt_mfn, 0, curr);
+        unmap_domain_page(pl4e);
+        switch ( rc )
+        {
+        case 0:
+            break;
+        case -EINTR:
+        case -ERESTART:
+            return -ERESTART;
+        default:
+            MEM_LOG("Error while installing new compat baseptr %lx", mfn);
+            return rc;
+        }
+
+        invalidate_shadow_ldt(curr, 0);
+        write_ptbase(curr);
+
+        return 0;
+    }
+
+    rc = put_old_guest_table(curr);
+    if ( unlikely(rc) )
+        return rc;
+
+    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
+    /*
+     * This is particularly important when getting restarted after the
+     * previous attempt got preempted in the put-old-MFN phase.
+     */
+    if ( old_base_mfn == mfn )
+    {
+        write_ptbase(curr);
+        return 0;
+    }
+
+    rc = paging_mode_refcounts(d)
+         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
+         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
+    switch ( rc )
+    {
+    case 0:
+        break;
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    default:
+        MEM_LOG("Error while installing new baseptr %lx", mfn);
+        return rc;
+    }
+
+    invalidate_shadow_ldt(curr, 0);
+
+    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+        fill_ro_mpt(mfn);
+    curr->arch.guest_table = pagetable_from_pfn(mfn);
+    update_cr3(curr);
+
+    write_ptbase(curr);
+
+    if ( likely(old_base_mfn != 0) )
+    {
+        struct page_info *page = mfn_to_page(old_base_mfn);
+
+        if ( paging_mode_refcounts(d) )
+            put_page(page);
+        else
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+                rc = -ERESTART;
+                /* fallthrough */
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+    }
+
+    return rc;
+}
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+static int ptwr_emulated_read(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(
+    unsigned long addr,
+    paddr_t old,
+    paddr_t val,
+    unsigned int bytes,
+    unsigned int do_cmpxchg,
+    struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
+    {
+        MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
+                ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t)-1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes*8)) - 1);
+        val <<= (offset)*8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes*8)) - 1);
+        old <<= (offset)*8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(mfn);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        int okay;
+        intpte_t t = old;
+        ol1e = l1e_from_intpte(old);
+
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)",
+                offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(
+        offset, 0, val, bytes, 0,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_old,
+    void *p_new,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
+    {
+        MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)",
+                offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(
+        offset, old, new, bytes, 1,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+                                struct x86_emulate_ctxt *ctxt)
+{
+    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
+                                              : X86EMUL_UNHANDLEABLE;
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = ptwr_emulated_read,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+                       struct cpu_user_regs *regs)
+{
+    struct domain *d = v->domain;
+    struct page_info *page;
+    l1_pgentry_t      pte;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .vendor = d->arch.cpuid->x86_vendor,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .swint_emulate = x86_swint_emulate_none,
+        },
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
+        goto bail;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        goto bail;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        goto bail;
+    }
+
+    ptwr_ctxt.cr2 = addr;
+    ptwr_ctxt.pte = pte;
+
+    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which are marked
+         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
+         * update has succeeded on a different vcpu).  Anything else is an
+         * emulation bug, or a guest playing with the instruction stream under
+         * Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+ bail:
+    return 0;
+}
+
+
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+int mmio_ro_emulated_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
+
+    /* Only allow naturally-aligned stores at the original %cr2 address. */
+    if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
+         offset != mmio_ro_ctxt->cr2 )
+    {
+        MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
+                mmio_ro_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmio_ro_emulated_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+int mmcfg_intercept_write(
+    enum x86_segment seg,
+    unsigned long offset,
+    void *p_data,
+    unsigned int bytes,
+    struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+    /*
+     * Only allow naturally-aligned stores no wider than 4 bytes to the
+     * original %cr2 address.
+     */
+    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+         offset != mmio_ctxt->cr2 )
+    {
+        MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)",
+                mmio_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    offset &= 0xfff;
+    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+                                  offset, bytes, p_data) >= 0 )
+        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
+                        *(uint32_t *)p_data);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = ptwr_emulated_read,
+    .write      = mmcfg_intercept_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+                          struct cpu_user_regs *regs)
+{
+    l1_pgentry_t pte;
+    unsigned long mfn;
+    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+    struct x86_emulate_ctxt ctxt = {
+        .regs = regs,
+        .vendor = v->domain->arch.cpuid->x86_vendor,
+        .addr_size = addr_size,
+        .sp_size = addr_size,
+        .swint_emulate = x86_swint_emulate_none,
+        .data = &mmio_ro_ctxt
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    guest_get_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of MMIO pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
+        return 0;
+
+    mfn = l1e_get_pfn(pte);
+    if ( mfn_valid(_mfn(mfn)) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return 0;
+    }
+
+    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        return 0;
+
+    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
+    else
+        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page or a successful
+         * concurrent pagetable update).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+/*************************
+ * PV MMU hypercalls
+ */
+long do_mmu_update(
+    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+    unsigned int count,
+    XEN_GUEST_HANDLE_PARAM(uint) pdone,
+    unsigned int foreigndom)
+{
+    struct mmu_update req;
+    void *va;
+    unsigned long gpfn, gmfn, mfn;
+    struct page_info *page;
+    unsigned int cmd, i = 0, done = 0, pt_dom;
+    struct vcpu *curr = current, *v = curr;
+    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+    struct domain_mmap_cache mapcache;
+    uint32_t xsm_needed = 0;
+    uint32_t xsm_checked = 0;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(ureqs)) )
+    {
+        /* See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below. */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmu_update);
+
+    if ( unlikely(!guest_handle_okay(ureqs, count)) )
+        return -EFAULT;
+
+    if ( (pt_dom = foreigndom >> 16) != 0 )
+    {
+        /* Pagetables belong to a foreign domain (PFD). */
+        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+            return -ESRCH;
+
+        if ( pt_owner == d )
+            rcu_unlock_domain(pt_owner);
+        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+    }
+
+    if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL )
+    {
+        rc = -ESRCH;
+        goto out;
+    }
+
+    domain_mmap_cache_init(&mapcache);
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+        {
+            MEM_LOG("Bad __copy_from_guest");
+            rc = -EFAULT;
+            break;
+        }
+
+        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+        switch ( cmd )
+        {
+            /*
+             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
+             */
+        case MMU_NORMAL_PT_UPDATE:
+        case MMU_PT_UPDATE_PRESERVE_AD:
+        {
+            p2m_type_t p2mt;
+
+            rc = -EOPNOTSUPP;
+            if ( unlikely(paging_mode_refcounts(pt_owner)) )
+                break;
+
+            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+            {
+                xsm_needed |= XSM_MMU_UPDATE_READ;
+                if ( get_pte_flags(req.val) & _PAGE_RW )
+                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
+            }
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+            rc = -EINVAL;
+
+            req.ptr -= cmd;
+            gmfn = req.ptr >> PAGE_SHIFT;
+            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+            if ( p2m_is_paged(p2mt) )
+            {
+                ASSERT(!page);
+                p2m_mem_paging_populate(pg_owner, gmfn);
+                rc = -ENOENT;
+                break;
+            }
+
+            if ( unlikely(!page) )
+            {
+                MEM_LOG("Could not get page for normal update");
+                break;
+            }
+
+            mfn = page_to_mfn(page);
+            va = map_domain_page_with_cache(mfn, &mapcache);
+            va = (void *)((unsigned long)va +
+                          (unsigned long)(req.ptr & ~PAGE_MASK));
+
+            if ( page_lock(page) )
+            {
+                switch ( page->u.inuse.type_info & PGT_type_mask )
+                {
+                case PGT_l1_page_table:
+                {
+                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
+                    p2m_type_t l1e_p2mt = p2m_ram_rw;
+                    struct page_info *target = NULL;
+                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+                                        P2M_UNSHARE : P2M_ALLOC;
+
+                    if ( paging_mode_translate(pg_owner) )
+                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+                                                   &l1e_p2mt, q);
+
+                    if ( p2m_is_paged(l1e_p2mt) )
+                    {
+                        if ( target )
+                            put_page(target);
+                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+                        rc = -ENOENT;
+                        break;
+                    }
+                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+                    {
+                        rc = -ENOENT;
+                        break;
+                    }
+                    /* If we tried to unshare and failed */
+                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+                    {
+                        /* We could not have obtained a page ref. */
+                        ASSERT(target == NULL);
+                        /* And mem_sharing_notify has already been called. */
+                        rc = -ENOMEM;
+                        break;
+                    }
+
+                    rc = mod_l1_entry(va, l1e, mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+                                      pg_owner);
+                    if ( target )
+                        put_page(target);
+                }
+                break;
+                case PGT_l2_page_table:
+                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l3_page_table:
+                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l4_page_table:
+                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                break;
+                case PGT_writable_page:
+                    perfc_incr(writable_mmu_updates);
+                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                        rc = 0;
+                    break;
+                }
+                page_unlock(page);
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+            }
+            else if ( get_page_type(page, PGT_writable_page) )
+            {
+                perfc_incr(writable_mmu_updates);
+                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                    rc = 0;
+                put_page_type(page);
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+            put_page(page);
+        }
+        break;
+
+        case MMU_MACHPHYS_UPDATE:
+            if ( unlikely(d != pt_owner) )
+            {
+                rc = -EPERM;
+                break;
+            }
+
+            if ( unlikely(paging_mode_translate(pg_owner)) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            mfn = req.ptr >> PAGE_SHIFT;
+            gpfn = req.val;
+
+            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+
+            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            {
+                MEM_LOG("Could not get page for mach->phys update");
+                rc = -EINVAL;
+                break;
+            }
+
+            set_gpfn_from_mfn(mfn, gpfn);
+
+            paging_mark_dirty(pg_owner, _mfn(mfn));
+
+            put_page(mfn_to_page(mfn));
+            break;
+
+        default:
+            MEM_LOG("Invalid page update command %x", cmd);
+            rc = -ENOSYS;
+            break;
+        }
+
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(ureqs, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmu_update, "hihi",
+            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmu_update, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    mm_put_pg_owner(pg_owner);
+
+    domain_mmap_cache_destroy(&mapcache);
+
+    perfc_add(num_page_updates, i);
+
+ out:
+    if ( pt_owner != d )
+        rcu_unlock_domain(pt_owner);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+static int __do_update_va_mapping(
+    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
+{
+    l1_pgentry_t   val = l1e_from_intpte(val64);
+    struct vcpu   *v   = current;
+    struct domain *d   = v->domain;
+    struct page_info *gl1pg;
+    l1_pgentry_t  *pl1e;
+    unsigned long  bmap_ptr, gl1mfn;
+    cpumask_t     *mask = NULL;
+    int            rc;
+
+    perfc_incr(calls_to_update_va);
+
+    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
+    if ( rc )
+        return rc;
+
+    rc = -EINVAL;
+    pl1e = guest_map_l1e(va, &gl1mfn);
+    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
+        goto out;
+
+    gl1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(gl1pg) )
+    {
+        put_page(gl1pg);
+        goto out;
+    }
+
+    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(gl1pg);
+        put_page(gl1pg);
+        goto out;
+    }
+
+    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
+
+    page_unlock(gl1pg);
+    put_page(gl1pg);
+
+ out:
+    if ( pl1e )
+        guest_unmap_l1e(pl1e);
+
+    switch ( flags & UVMF_FLUSHTYPE_MASK )
+    {
+    case UVMF_TLB_FLUSH:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            flush_tlb_local();
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = mm_vcpumask_to_pcpumask(d,
+                                         const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                         mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_mask(mask);
+        break;
+
+    case UVMF_INVLPG:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            paging_invlpg(v, va);
+            break;
+        case UVMF_ALL:
+            mask = d->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = mm_vcpumask_to_pcpumask(d,
+                                         const_guest_handle_from_ptr(bmap_ptr,
+                                                                     void),
+                                         mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
+        break;
+    }
+
+    return rc;
+}
+
+long do_update_va_mapping(unsigned long va, u64 val64,
+                          unsigned long flags)
+{
+    return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
+                                      unsigned long flags,
+                                      domid_t domid)
+{
+    struct domain *pg_owner;
+    int rc;
+
+    if ( (pg_owner = mm_get_pg_owner(domid)) == NULL )
+        return -ESRCH;
+
+    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+    mm_put_pg_owner(pg_owner);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 967b7fcda9..170908f7f2 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -649,6 +649,11 @@ int mm_vcpumask_to_pcpumask(struct domain *d,
                             XEN_GUEST_HANDLE_PARAM(const_void) bmap,
                             cpumask_t *pmask);
 
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            unsigned int flags, unsigned int cache_flags);
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags);
+
 #define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
 
 #define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH RFC 13/13] x86/mm: split HVM grant table code to hvm/mm.c
  2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
                   ` (11 preceding siblings ...)
  2017-03-27  9:10 ` [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c Wei Liu
@ 2017-03-27  9:10 ` Wei Liu
  12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27  9:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich

PG_external requires hardware support so the code guarded by that is HVM
only.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/Makefile |  1 +
 xen/arch/x86/hvm/mm.c     | 84 +++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm.c         | 48 ---------------------------
 xen/include/asm-x86/mm.h  |  5 +++
 4 files changed, 90 insertions(+), 48 deletions(-)
 create mode 100644 xen/arch/x86/hvm/mm.c

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index 0a3d0f4f7e..3e8d362c8e 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -13,6 +13,7 @@ obj-y += intercept.o
 obj-y += io.o
 obj-y += ioreq.o
 obj-y += irq.o
+obj-y += mm.o
 obj-y += monitor.o
 obj-y += mtrr.o
 obj-y += nestedhvm.o
diff --git a/xen/arch/x86/hvm/mm.c b/xen/arch/x86/hvm/mm.c
new file mode 100644
index 0000000000..56a49238a5
--- /dev/null
+++ b/xen/arch/x86/hvm/mm.c
@@ -0,0 +1,84 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/types.h>
+
+#include <public/grant_table.h>
+
+#include <asm/p2m.h>
+
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags)
+{
+    p2m_type_t p2mt;
+    int rc;
+
+    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
+        return GNTST_general_error;
+
+    if ( flags & GNTMAP_readonly )
+        p2mt = p2m_grant_map_ro;
+    else
+        p2mt = p2m_grant_map_rw;
+    rc = guest_physmap_add_entry(current->domain,
+                                 _gfn(addr >> PAGE_SHIFT),
+                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
+    if ( rc )
+        return GNTST_general_error;
+    else
+        return GNTST_okay;
+}
+
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags)
+{
+    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
+    p2m_type_t type;
+    mfn_t old_mfn;
+    struct domain *d = current->domain;
+
+    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
+        return GNTST_general_error;
+
+    old_mfn = get_gfn(d, gfn, &type);
+    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
+    {
+        put_gfn(d, gfn);
+        MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
+                type, mfn_x(old_mfn), frame);
+        return GNTST_general_error;
+    }
+    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
+
+    put_gfn(d, gfn);
+    return GNTST_okay;
+}
+
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 0119cacc43..dc3d185cbe 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2964,29 +2964,6 @@ long do_mmuext_op(
     return rc;
 }
 
-static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
-                                    unsigned int flags,
-                                    unsigned int cache_flags)
-{
-    p2m_type_t p2mt;
-    int rc;
-
-    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
-        return GNTST_general_error;
-
-    if ( flags & GNTMAP_readonly )
-        p2mt = p2m_grant_map_ro;
-    else
-        p2mt = p2m_grant_map_rw;
-    rc = guest_physmap_add_entry(current->domain,
-                                 _gfn(addr >> PAGE_SHIFT),
-                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
-    if ( rc )
-        return GNTST_general_error;
-    else
-        return GNTST_okay;
-}
-
 int create_grant_host_mapping(uint64_t addr, unsigned long frame,
                               unsigned int flags, unsigned int cache_flags)
 {
@@ -2996,31 +2973,6 @@ int create_grant_host_mapping(uint64_t addr, unsigned long frame,
     return create_grant_pv_mapping(addr, frame, flags, cache_flags);
 }
 
-static int replace_grant_p2m_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
-{
-    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
-    p2m_type_t type;
-    mfn_t old_mfn;
-    struct domain *d = current->domain;
-
-    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
-        return GNTST_general_error;
-
-    old_mfn = get_gfn(d, gfn, &type);
-    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
-    {
-        put_gfn(d, gfn);
-        MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
-                type, mfn_x(old_mfn), frame);
-        return GNTST_general_error;
-    }
-    guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
-
-    put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
 int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
                                uint64_t new_addr, unsigned int flags)
 {
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 170908f7f2..b366d870b1 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -653,6 +653,11 @@ int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
                             unsigned int flags, unsigned int cache_flags);
 int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
                              uint64_t new_addr, unsigned int flags);
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags);
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags);
 
 #define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-27  9:10 ` [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner Wei Liu
@ 2017-03-28 21:11   ` Andrew Cooper
  2017-03-29  9:03     ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-03-28 21:11 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 27/03/2017 10:10, Wei Liu wrote:
> Prefix them with "mm_" and add declarations to asm-x86/mm.h.
>
> They will be needed when we split PV specific code out of x86/mm.c.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

I have to admit that I don't understand why they are called
{get,put}_pg_owner.  Perhaps very historical from Linux?

They are nothing to do with pages, and get a reference on the domain.

I'd recommend s/pg_owner/domain/ so the function calls actually indicate
what object is having the reference taken on it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h
  2017-03-27  9:10 ` [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h Wei Liu
@ 2017-03-28 21:14   ` Andrew Cooper
  2017-03-29 10:50     ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-03-28 21:14 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 27/03/2017 10:10, Wei Liu wrote:
> It will be needed later when we split mm.c.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

I have a pending patch to take MEM_LOG() out entirely, along with
correcting some of the information printed.

I'd prefer not to perpetuate its existence.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-28 21:11   ` [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner Andrew Cooper
@ 2017-03-29  9:03     ` Jan Beulich
  2017-03-29  9:10       ` Andrew Cooper
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-29  9:03 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: George Dunlap, Xen-devel, Tim Deegan

>>> On 28.03.17 at 23:11, <andrew.cooper3@citrix.com> wrote:
> On 27/03/2017 10:10, Wei Liu wrote:
>> Prefix them with "mm_" and add declarations to asm-x86/mm.h.
>>
>> They will be needed when we split PV specific code out of x86/mm.c.

Is that actually the case? They're about PV (target) domains, so
I'd kind of expect them to move together with the PV-only code,
even if the caller may not be PV.

>> No functional change.
>>
>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> I have to admit that I don't understand why they are called
> {get,put}_pg_owner.  Perhaps very historical from Linux?

I don't think any of this code has Linux origin.

> They are nothing to do with pages, and get a reference on the domain.

Depends on the perspective you take: For all of their callers,
they have precisely that meaning.

> I'd recommend s/pg_owner/domain/ so the function calls actually indicate
> what object is having the reference taken on it.

Well, to make clear what uses are legitimate, perhaps
s/pg_owner/foreign_domain/ (if you really continue to think
these should be renamed in the first place)? Using just "domain"
pretty clearly results in too generic names. Perhaps additionally
they should be prefixed mm_?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-29  9:03     ` Jan Beulich
@ 2017-03-29  9:10       ` Andrew Cooper
  2017-03-30 10:07         ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-03-29  9:10 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu; +Cc: George Dunlap, Xen-devel, Tim Deegan

On 29/03/17 10:03, Jan Beulich wrote:
>>>> On 28.03.17 at 23:11, <andrew.cooper3@citrix.com> wrote:
>> On 27/03/2017 10:10, Wei Liu wrote:
>>> Prefix them with "mm_" and add declarations to asm-x86/mm.h.
>>>
>>> They will be needed when we split PV specific code out of x86/mm.c.
> Is that actually the case? They're about PV (target) domains, so
> I'd kind of expect them to move together with the PV-only code,
> even if the caller may not be PV.
>
>>> No functional change.
>>>
>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> I have to admit that I don't understand why they are called
>> {get,put}_pg_owner.  Perhaps very historical from Linux?
> I don't think any of this code has Linux origin.
>
>> They are nothing to do with pages, and get a reference on the domain.
> Depends on the perspective you take: For all of their callers,
> they have precisely that meaning.
>
>> I'd recommend s/pg_owner/domain/ so the function calls actually indicate
>> what object is having the reference taken on it.
> Well, to make clear what uses are legitimate, perhaps
> s/pg_owner/foreign_domain/ (if you really continue to think
> these should be renamed in the first place)? Using just "domain"
> pretty clearly results in too generic names. Perhaps additionally
> they should be prefixed mm_?

Sorry - I had intended my suggestion to be in combination with the mm_
prefixes, so mm_{get,put}_domain().

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask
  2017-03-27  9:10 ` [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask Wei Liu
@ 2017-03-29 10:15   ` Andrew Cooper
  2017-03-29 10:49     ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-03-29 10:15 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 27/03/17 10:10, Wei Liu wrote:
> Prefix it with "mm_". No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
> It used to be a inline function, but I don't think there is no point in

Double negative.  "think there is any point" ?

mmuext_op was added to HVM guests in c/s 2204b1ce06ac for PVHv1. 
However, everything in it is PV specific, so I think it is reasonable to
make its reference in the HVM hypercall table conditional on CONFIG_PV,
and for it to move fully into pv/mm.c

At that point, this function can move as well.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-27  9:10 ` [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-03-29 10:27   ` Andrew Cooper
  2017-03-29 10:45     ` Jan Beulich
  2017-04-03  8:40     ` Wei Liu
  0 siblings, 2 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-03-29 10:27 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Tim Deegan, Jan Beulich

On 27/03/17 10:10, Wei Liu wrote:
> We will later split out PV specific code to pv/mm.c.
>
> No functional change.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Please could you take the time to correctly mfn_t/gfn_t the calls. 
Having both addr and frame, and them being otherwise unqualified, is
confusing to read.

One idea I had while starting the hypercall work was to introduce a
"struct guest_type_ops" to contain some function pointers for options we
perform on all guests, irrespective of type.  My first candidate for
splitting this way was the hypercall page writing.

This splitting here is subtly different.  Its more "struct
paging_type_op", but still common operations we would need to perform
for guests.

I was thinking that ops structures like this would be cleaner to isolate
than needing an explicit dispatch functions such as
create_grant_host_mapping() below.

Thoughts, (seeing as this is the first time I have floated this idea on
xen-devel) ?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-29 10:27   ` Andrew Cooper
@ 2017-03-29 10:45     ` Jan Beulich
  2017-03-29 10:49       ` Andrew Cooper
  2017-04-03  8:40     ` Wei Liu
  1 sibling, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-29 10:45 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu; +Cc: George Dunlap, Xen-devel, Tim Deegan

>>> On 29.03.17 at 12:27, <andrew.cooper3@citrix.com> wrote:
> One idea I had while starting the hypercall work was to introduce a
> "struct guest_type_ops" to contain some function pointers for options we
> perform on all guests, irrespective of type.  My first candidate for
> splitting this way was the hypercall page writing.
> 
> This splitting here is subtly different.  Its more "struct
> paging_type_op", but still common operations we would need to perform
> for guests.
> 
> I was thinking that ops structures like this would be cleaner to isolate
> than needing an explicit dispatch functions such as
> create_grant_host_mapping() below.
> 
> Thoughts, (seeing as this is the first time I have floated this idea on
> xen-devel) ?

I think that's reasonable as long as we don't go too far with this
(indirect calls after all generally having a higher overhead than
mis-predicted branches).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask
  2017-03-29 10:15   ` Andrew Cooper
@ 2017-03-29 10:49     ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-29 10:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel, Wei Liu, Jan Beulich, Tim Deegan

On Wed, Mar 29, 2017 at 11:15:18AM +0100, Andrew Cooper wrote:
> On 27/03/17 10:10, Wei Liu wrote:
> > Prefix it with "mm_". No functional change.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> > It used to be a inline function, but I don't think there is no point in
> 
> Double negative.  "think there is any point" ?
> 

Right. That's that I meant.

> mmuext_op was added to HVM guests in c/s 2204b1ce06ac for PVHv1. 
> However, everything in it is PV specific, so I think it is reasonable to
> make its reference in the HVM hypercall table conditional on CONFIG_PV,
> and for it to move fully into pv/mm.c
> 

I am certainly fine with this.

> At that point, this function can move as well.
> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-29 10:45     ` Jan Beulich
@ 2017-03-29 10:49       ` Andrew Cooper
  2017-03-29 11:02         ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Andrew Cooper @ 2017-03-29 10:49 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu; +Cc: George Dunlap, Xen-devel, Tim Deegan

On 29/03/17 11:45, Jan Beulich wrote:
>>>> On 29.03.17 at 12:27, <andrew.cooper3@citrix.com> wrote:
>> One idea I had while starting the hypercall work was to introduce a
>> "struct guest_type_ops" to contain some function pointers for options we
>> perform on all guests, irrespective of type.  My first candidate for
>> splitting this way was the hypercall page writing.
>>
>> This splitting here is subtly different.  Its more "struct
>> paging_type_op", but still common operations we would need to perform
>> for guests.
>>
>> I was thinking that ops structures like this would be cleaner to isolate
>> than needing an explicit dispatch functions such as
>> create_grant_host_mapping() below.
>>
>> Thoughts, (seeing as this is the first time I have floated this idea on
>> xen-devel) ?
> I think that's reasonable as long as we don't go too far with this
> (indirect calls after all generally having a higher overhead than
> mis-predicted branches).

Without any #ifdefary in a source tree, that is fine.  It is harder
however if you want to conditionally compile things out.

A alternative option would be to have a static inline in a header file
which contains the appropriate #ifdefary internally.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h
  2017-03-28 21:14   ` Andrew Cooper
@ 2017-03-29 10:50     ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-29 10:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel, Wei Liu, Jan Beulich, Tim Deegan

On Tue, Mar 28, 2017 at 10:14:13PM +0100, Andrew Cooper wrote:
> On 27/03/2017 10:10, Wei Liu wrote:
> > It will be needed later when we split mm.c.
> >
> > No functional change.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> I have a pending patch to take MEM_LOG() out entirely, along with
> correcting some of the information printed.
> 
> I'd prefer not to perpetuate its existence.
> 

Sure. Please send me that patch and I will stick it at the beginning.

Wei.

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-29 10:49       ` Andrew Cooper
@ 2017-03-29 11:02         ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-29 11:02 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel, Wei Liu, Jan Beulich, Tim Deegan

On Wed, Mar 29, 2017 at 11:49:54AM +0100, Andrew Cooper wrote:
> On 29/03/17 11:45, Jan Beulich wrote:
> >>>> On 29.03.17 at 12:27, <andrew.cooper3@citrix.com> wrote:
> >> One idea I had while starting the hypercall work was to introduce a
> >> "struct guest_type_ops" to contain some function pointers for options we
> >> perform on all guests, irrespective of type.  My first candidate for
> >> splitting this way was the hypercall page writing.
> >>
> >> This splitting here is subtly different.  Its more "struct
> >> paging_type_op", but still common operations we would need to perform
> >> for guests.
> >>
> >> I was thinking that ops structures like this would be cleaner to isolate
> >> than needing an explicit dispatch functions such as
> >> create_grant_host_mapping() below.
> >>
> >> Thoughts, (seeing as this is the first time I have floated this idea on
> >> xen-devel) ?
> > I think that's reasonable as long as we don't go too far with this
> > (indirect calls after all generally having a higher overhead than
> > mis-predicted branches).
> 
> Without any #ifdefary in a source tree, that is fine.  It is harder
> however if you want to conditionally compile things out.
> 
> A alternative option would be to have a static inline in a header file
> which contains the appropriate #ifdefary internally.
> 

Both sound reasonable to me. I slightly prefer guest_op_ops because it
seems cleaner.

The overhead concern applies to hot path code -- I don't think most
guest type specific hypercalls qualify.

Wei.

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-29  9:10       ` Andrew Cooper
@ 2017-03-30 10:07         ` Wei Liu
  2017-03-30 12:25           ` Jan Beulich
  0 siblings, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-03-30 10:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel, Wei Liu, Jan Beulich, Tim Deegan

On Wed, Mar 29, 2017 at 10:10:39AM +0100, Andrew Cooper wrote:
> On 29/03/17 10:03, Jan Beulich wrote:
> >>>> On 28.03.17 at 23:11, <andrew.cooper3@citrix.com> wrote:
> >> On 27/03/2017 10:10, Wei Liu wrote:
> >>> Prefix them with "mm_" and add declarations to asm-x86/mm.h.
> >>>
> >>> They will be needed when we split PV specific code out of x86/mm.c.
> > Is that actually the case? They're about PV (target) domains, so
> > I'd kind of expect them to move together with the PV-only code,
> > even if the caller may not be PV.
> >
> >>> No functional change.
> >>>
> >>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >> I have to admit that I don't understand why they are called
> >> {get,put}_pg_owner.  Perhaps very historical from Linux?
> > I don't think any of this code has Linux origin.
> >
> >> They are nothing to do with pages, and get a reference on the domain.
> > Depends on the perspective you take: For all of their callers,
> > they have precisely that meaning.
> >
> >> I'd recommend s/pg_owner/domain/ so the function calls actually indicate
> >> what object is having the reference taken on it.
> > Well, to make clear what uses are legitimate, perhaps
> > s/pg_owner/foreign_domain/ (if you really continue to think
> > these should be renamed in the first place)? Using just "domain"
> > pretty clearly results in too generic names. Perhaps additionally
> > they should be prefixed mm_?
> 
> Sorry - I had intended my suggestion to be in combination with the mm_
> prefixes, so mm_{get,put}_domain().

Fine by me. I've always found the original names cryptic.

Assuming exporting the functions are still necessary, I will use the
suggested names.

> 
> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-30 10:07         ` Wei Liu
@ 2017-03-30 12:25           ` Jan Beulich
  2017-03-30 12:48             ` Wei Liu
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Beulich @ 2017-03-30 12:25 UTC (permalink / raw)
  To: Wei Liu; +Cc: GeorgeDunlap, Andrew Cooper, Tim Deegan, Xen-devel

>>> On 30.03.17 at 12:07, <wei.liu2@citrix.com> wrote:
> Assuming exporting the functions are still necessary, I will use the
> suggested names.

Well, I've asked before: Are they really needed outside of PV-specific
code? It would be odd if so ...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner
  2017-03-30 12:25           ` Jan Beulich
@ 2017-03-30 12:48             ` Wei Liu
  0 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-30 12:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: GeorgeDunlap, Andrew Cooper, Tim Deegan, Wei Liu, Xen-devel

On Thu, Mar 30, 2017 at 06:25:33AM -0600, Jan Beulich wrote:
> >>> On 30.03.17 at 12:07, <wei.liu2@citrix.com> wrote:
> > Assuming exporting the functions are still necessary, I will use the
> > suggested names.
> 
> Well, I've asked before: Are they really needed outside of PV-specific
> code? It would be odd if so ...
> 

I need to check this later. That's why I said "assuming...".

> Jan
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-03-29 10:27   ` Andrew Cooper
  2017-03-29 10:45     ` Jan Beulich
@ 2017-04-03  8:40     ` Wei Liu
  2017-04-03 14:05       ` Andrew Cooper
  1 sibling, 1 reply; 30+ messages in thread
From: Wei Liu @ 2017-04-03  8:40 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: George Dunlap, Xen-devel, Wei Liu, Jan Beulich, Tim Deegan

On Wed, Mar 29, 2017 at 11:27:49AM +0100, Andrew Cooper wrote:
> On 27/03/17 10:10, Wei Liu wrote:
> > We will later split out PV specific code to pv/mm.c.
> >
> > No functional change.
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> 
> Please could you take the time to correctly mfn_t/gfn_t the calls. 
> Having both addr and frame, and them being otherwise unqualified, is
> confusing to read.
> 

I'm not sure what you want here.

The create_grant_host_mapping function is hooked into grant table code,
which is using "unsigned long" everywhere.

Modifying it here would require fixing all call sites as well. It'd be
better be done in a separate patch.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping
  2017-04-03  8:40     ` Wei Liu
@ 2017-04-03 14:05       ` Andrew Cooper
  0 siblings, 0 replies; 30+ messages in thread
From: Andrew Cooper @ 2017-04-03 14:05 UTC (permalink / raw)
  To: Wei Liu; +Cc: George Dunlap, Xen-devel, Tim Deegan, Jan Beulich

On 03/04/17 09:40, Wei Liu wrote:
> On Wed, Mar 29, 2017 at 11:27:49AM +0100, Andrew Cooper wrote:
>> On 27/03/17 10:10, Wei Liu wrote:
>>> We will later split out PV specific code to pv/mm.c.
>>>
>>> No functional change.
>>>
>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>> Please could you take the time to correctly mfn_t/gfn_t the calls. 
>> Having both addr and frame, and them being otherwise unqualified, is
>> confusing to read.
>>
> I'm not sure what you want here.
>
> The create_grant_host_mapping function is hooked into grant table code,
> which is using "unsigned long" everywhere.
>
> Modifying it here would require fixing all call sites as well. It'd be
> better be done in a separate patch.

Separate patch is fine, but I purposefully want to avoid moving code
into pv/ and leaving it with style problems.  (i.e. take the time to
ensure the status quo improves).

Fix it then move it, or move it then fix it, but please don't move it
and leave it unfixed.  This covers other areas like bool correctness,
mfn_t/gfn_t, const correctness, comment styles, whitespace, etc.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-04-03 14:18 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27  9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
2017-03-27  9:10 ` [PATCH RFC 01/13] x86/mm: export {get,put}_pg_owner Wei Liu
2017-03-28 21:11   ` [PATCH RFC 01/13] x86/mm: export {get, put}_pg_owner Andrew Cooper
2017-03-29  9:03     ` Jan Beulich
2017-03-29  9:10       ` Andrew Cooper
2017-03-30 10:07         ` Wei Liu
2017-03-30 12:25           ` Jan Beulich
2017-03-30 12:48             ` Wei Liu
2017-03-27  9:10 ` [PATCH RFC 02/13] x86/mm: move MEM_LOG to asm-x86/mm.h Wei Liu
2017-03-28 21:14   ` Andrew Cooper
2017-03-29 10:50     ` Wei Liu
2017-03-27  9:10 ` [PATCH RFC 03/13] x86/mm: export vcpumask_to_pcpumask Wei Liu
2017-03-29 10:15   ` Andrew Cooper
2017-03-29 10:49     ` Wei Liu
2017-03-27  9:10 ` [PATCH RFC 04/13] x86/mm: carve out create_grant_pv_mapping Wei Liu
2017-03-29 10:27   ` Andrew Cooper
2017-03-29 10:45     ` Jan Beulich
2017-03-29 10:49       ` Andrew Cooper
2017-03-29 11:02         ` Wei Liu
2017-04-03  8:40     ` Wei Liu
2017-04-03 14:05       ` Andrew Cooper
2017-03-27  9:10 ` [PATCH RFC 05/13] x86/mm: carve out replace_grant_pv_mapping Wei Liu
2017-03-27  9:10 ` [PATCH RFC 06/13] x86/mm: extract page table masks to mm.h Wei Liu
2017-03-27  9:10 ` [PATCH RFC 07/13] x86/mm: extract PAGE_CACHE_ATTRS " Wei Liu
2017-03-27  9:10 ` [PATCH RFC 08/13] x86/mm: extract adjust_guest_l*e macros " Wei Liu
2017-03-27  9:10 ` [PATCH RFC 09/13] x86/mm: export a bunch of {get, put}_page functions Wei Liu
2017-03-27  9:10 ` [PATCH RFC 10/13] x86/mm: export invalidate_shadow_ldt Wei Liu
2017-03-27  9:10 ` [PATCH RFC 11/13] x86/mm: export create_pae_xen_mappings Wei Liu
2017-03-27  9:10 ` [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c Wei Liu
2017-03-27  9:10 ` [PATCH RFC 13/13] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.