All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/21] x86: refactor mm.c (the easy part)
@ 2017-07-20 16:04 Wei Liu
  2017-07-20 16:04 ` [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping Wei Liu
                   ` (22 more replies)
  0 siblings, 23 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

This series is part of my work to refactor x86/mm.c. It has grown to a 21
patches series so I figure I should probably get them approved first before
making more changes.

What is left is mostly PV MMU hypercall functions and their supporting code.
I'm still thinking about how to refactor those because the helper functions are
a bit convulted. The helper functions are both used by PV MMU code and the
common get / put functions. I think I need to refactor the get / put functions.
If you think there is a better approach please let me know.

The code can be found at:
   https://xenbits.xen.org/git-http/people/liuw/xen.git wip.split-mm-v3.1

Wei.

Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Wei Liu (21):
  x86/mm: carve out create_grant_pv_mapping
  x86/mm: carve out replace_grant_pv_mapping
  x86/mm: split HVM grant table code to hvm/grant_table.c
  x86/mm: lift PAGE_CACHE_ATTRS to page.h
  x86/mm: document the return values from get_page_from_l*e
  x86: move pv_emul_is_mem_write to pv/emulate.c
  x86/mm: move and rename guest_get_eff{,kern}_l1e
  x86/mm: export get_page_from_pagenr
  x86/mm: rename and move update_intpte
  x86/mm: move {un,}adjust_guest_* to pv/mm.h
  x86/mm: split out writable pagetable emulation code
  x86/mm: split out readonly MMIO emulation code
  x86/mm: remove the unused inclusion of pv/emulate.h
  x86/mm: move and rename guest_{,un}map_l1e
  x86/mm: split out PV grant table code
  x86/mm: split out descriptor table code
  x86/mm: move compat descriptor handling code
  x86/mm: move and rename map_ldt_shadow_page
  x86/mm: factor out pv_arch_init_memory
  x86/mm: move l4 table setup code
  x86/mm: add "pv_" prefix to new_guest_cr3

 xen/arch/x86/domain.c                 |   11 +-
 xen/arch/x86/hvm/Makefile             |    1 +
 xen/arch/x86/hvm/grant_table.c        |   89 ++
 xen/arch/x86/mm.c                     | 1556 ++++-----------------------------
 xen/arch/x86/pv/Makefile              |    5 +
 xen/arch/x86/pv/descriptor-tables.c   |  270 ++++++
 xen/arch/x86/pv/dom0_build.c          |    3 +-
 xen/arch/x86/pv/domain.c              |    3 +-
 xen/arch/x86/pv/emul-mmio-op.c        |  166 ++++
 xen/arch/x86/pv/emul-priv-op.c        |    3 +-
 xen/arch/x86/pv/emul-ptwr-op.c        |  327 +++++++
 xen/arch/x86/pv/emulate.c             |    7 +
 xen/arch/x86/pv/emulate.h             |    5 +
 xen/arch/x86/pv/grant_table.c         |  386 ++++++++
 xen/arch/x86/pv/mm.c                  |  222 +++++
 xen/arch/x86/traps.c                  |    5 +-
 xen/arch/x86/x86_64/compat/mm.c       |   39 -
 xen/include/asm-x86/grant_table.h     |   26 +-
 xen/include/asm-x86/hvm/grant_table.h |   61 ++
 xen/include/asm-x86/mm.h              |    6 +-
 xen/include/asm-x86/page.h            |    2 +
 xen/include/asm-x86/processor.h       |    5 -
 xen/include/asm-x86/pv/grant_table.h  |   60 ++
 xen/include/asm-x86/pv/mm.h           |  141 +++
 xen/include/asm-x86/pv/processor.h    |   42 +
 25 files changed, 1972 insertions(+), 1469 deletions(-)
 create mode 100644 xen/arch/x86/hvm/grant_table.c
 create mode 100644 xen/arch/x86/pv/descriptor-tables.c
 create mode 100644 xen/arch/x86/pv/emul-mmio-op.c
 create mode 100644 xen/arch/x86/pv/emul-ptwr-op.c
 create mode 100644 xen/arch/x86/pv/grant_table.c
 create mode 100644 xen/arch/x86/pv/mm.c
 create mode 100644 xen/include/asm-x86/hvm/grant_table.h
 create mode 100644 xen/include/asm-x86/pv/grant_table.h
 create mode 100644 xen/include/asm-x86/pv/mm.h
 create mode 100644 xen/include/asm-x86/pv/processor.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-08-28 15:16   ` George Dunlap
  2017-07-20 16:04 ` [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping Wei Liu
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

And at once make create_grant_host_mapping an inline function.  This
requires making create_grant_{hvm,pv}_mapping non-static.  Provide
{hvm,pv}/grant_table.h. Include the headers where necessary.

The two functions create_grant_{hvm,pv}_mapping will be moved later in
a dedicated patch with all their helpers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c                     | 16 +++++------
 xen/include/asm-x86/grant_table.h     | 16 +++++++++--
 xen/include/asm-x86/hvm/grant_table.h | 53 +++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/grant_table.h  | 52 ++++++++++++++++++++++++++++++++++
 4 files changed, 127 insertions(+), 10 deletions(-)
 create mode 100644 xen/include/asm-x86/hvm/grant_table.h
 create mode 100644 xen/include/asm-x86/pv/grant_table.h

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 19f672d880..532b1ee7e7 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -123,6 +123,9 @@
 #include <asm/io_apic.h>
 #include <asm/pci.h>
 
+#include <asm/hvm/grant_table.h>
+#include <asm/pv/grant_table.h>
+
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
@@ -4242,9 +4245,9 @@ static int destroy_grant_va_mapping(
     return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
 }
 
-static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
-                                    unsigned int flags,
-                                    unsigned int cache_flags)
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags)
 {
     p2m_type_t p2mt;
     int rc;
@@ -4265,15 +4268,12 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
         return GNTST_okay;
 }
 
-int create_grant_host_mapping(uint64_t addr, unsigned long frame,
-                              unsigned int flags, unsigned int cache_flags)
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            unsigned int flags, unsigned int cache_flags)
 {
     l1_pgentry_t pte;
     uint32_t grant_pte_flags;
 
-    if ( paging_mode_external(current->domain) )
-        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
-
     grant_pte_flags =
         _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
     if ( cpu_has_nx )
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index 32d0a864b6..4aa22126d3 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -7,14 +7,26 @@
 #ifndef __ASM_GRANT_TABLE_H__
 #define __ASM_GRANT_TABLE_H__
 
+#include <asm/paging.h>
+
+#include <asm/hvm/grant_table.h>
+#include <asm/pv/grant_table.h>
+
 #define INITIAL_NR_GRANT_FRAMES 4
 
 /*
  * Caller must own caller's BIGLOCK, is responsible for flushing the TLB, and
  * must hold a reference to the page.
  */
-int create_grant_host_mapping(uint64_t addr, unsigned long frame,
-			      unsigned int flags, unsigned int cache_flags);
+static inline int create_grant_host_mapping(uint64_t addr, unsigned long frame,
+                                            unsigned int flags,
+                                            unsigned int cache_flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
+    return create_grant_pv_mapping(addr, frame, flags, cache_flags);
+}
+
 int replace_grant_host_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags);
 
diff --git a/xen/include/asm-x86/hvm/grant_table.h b/xen/include/asm-x86/hvm/grant_table.h
new file mode 100644
index 0000000000..83202c219c
--- /dev/null
+++ b/xen/include/asm-x86/hvm/grant_table.h
@@ -0,0 +1,53 @@
+/*
+ * asm-x86/hvm/grant_table.h
+ *
+ * Grant table interfaces for HVM guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_HVM_GRANT_TABLE_H__
+#define __X86_HVM_GRANT_TABLE_H__
+
+#ifdef CONFIG_HVM
+
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags);
+
+#else
+
+#include <public/grant_table.h>
+
+static inline int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                                           unsigned int flags,
+                                           unsigned int cache_flags)
+{
+    return GNTST_general_error;
+}
+
+#endif
+
+#endif /* __X86_HVM_GRANT_TABLE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/pv/grant_table.h b/xen/include/asm-x86/pv/grant_table.h
new file mode 100644
index 0000000000..165ebce22f
--- /dev/null
+++ b/xen/include/asm-x86/pv/grant_table.h
@@ -0,0 +1,52 @@
+/*
+ * asm-x86/pv/grant_table.h
+ *
+ * Grant table interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_GRANT_TABLE_H__
+#define __X86_PV_GRANT_TABLE_H__
+
+#ifdef CONFIG_PV
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            unsigned int flags, unsigned int cache_flags);
+
+#else
+
+#include <public/grant_table.h>
+
+static inline int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                                          unsigned int flags,
+                                          unsigned int cache_flags)
+{
+    return GNTST_general_error;
+}
+
+#endif
+
+#endif /* __X86_PV_GRANT_TABLE_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
  2017-07-20 16:04 ` [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-08-28 15:19   ` George Dunlap
  2017-07-20 16:04 ` [PATCH v3 03/21] x86/mm: split HVM grant table code to hvm/grant_table.c Wei Liu
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

And at once make it an inline function. Add declarations of
replace_grant_{hvm,pv}_mapping to respective header files.

The code movement will be done later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c                     |  9 +++------
 xen/include/asm-x86/grant_table.h     | 10 ++++++++--
 xen/include/asm-x86/hvm/grant_table.h |  8 ++++++++
 xen/include/asm-x86/pv/grant_table.h  |  8 ++++++++
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 532b1ee7e7..defc2c9bcc 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4296,7 +4296,7 @@ int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
     return create_grant_va_mapping(addr, pte, current);
 }
 
-static int replace_grant_p2m_mapping(
+int replace_grant_p2m_mapping(
     uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
 {
     unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
@@ -4326,8 +4326,8 @@ static int replace_grant_p2m_mapping(
     return GNTST_okay;
 }
 
-int replace_grant_host_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags)
 {
     struct vcpu *curr = current;
     l1_pgentry_t *pl1e, ol1e;
@@ -4335,9 +4335,6 @@ int replace_grant_host_mapping(
     struct page_info *l1pg;
     int rc;
 
-    if ( paging_mode_external(current->domain) )
-        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
-
     if ( flags & GNTMAP_contains_pte )
     {
         if ( !new_addr )
diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
index 4aa22126d3..6c98672a4d 100644
--- a/xen/include/asm-x86/grant_table.h
+++ b/xen/include/asm-x86/grant_table.h
@@ -27,8 +27,14 @@ static inline int create_grant_host_mapping(uint64_t addr, unsigned long frame,
     return create_grant_pv_mapping(addr, frame, flags, cache_flags);
 }
 
-int replace_grant_host_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags);
+static inline int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
+                                             uint64_t new_addr,
+                                             unsigned int flags)
+{
+    if ( paging_mode_external(current->domain) )
+        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
+    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
+}
 
 #define gnttab_create_shared_page(d, t, i)                               \
     do {                                                                 \
diff --git a/xen/include/asm-x86/hvm/grant_table.h b/xen/include/asm-x86/hvm/grant_table.h
index 83202c219c..4b1afa179b 100644
--- a/xen/include/asm-x86/hvm/grant_table.h
+++ b/xen/include/asm-x86/hvm/grant_table.h
@@ -26,6 +26,8 @@
 int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
                              unsigned int flags,
                              unsigned int cache_flags);
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags);
 
 #else
 
@@ -38,6 +40,12 @@ static inline int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
     return GNTST_general_error;
 }
 
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags)
+{
+    return GNTST_general_error;
+}
+
 #endif
 
 #endif /* __X86_HVM_GRANT_TABLE_H__ */
diff --git a/xen/include/asm-x86/pv/grant_table.h b/xen/include/asm-x86/pv/grant_table.h
index 165ebce22f..c6474973cd 100644
--- a/xen/include/asm-x86/pv/grant_table.h
+++ b/xen/include/asm-x86/pv/grant_table.h
@@ -25,6 +25,8 @@
 
 int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
                             unsigned int flags, unsigned int cache_flags);
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags);
 
 #else
 
@@ -37,6 +39,12 @@ static inline int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
     return GNTST_general_error;
 }
 
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags)
+{
+    return GNTST_general_error;
+}
+
 #endif
 
 #endif /* __X86_PV_GRANT_TABLE_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 03/21] x86/mm: split HVM grant table code to hvm/grant_table.c
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
  2017-07-20 16:04 ` [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping Wei Liu
  2017-07-20 16:04 ` [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 04/21] x86/mm: lift PAGE_CACHE_ATTRS to page.h Wei Liu
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/hvm/Makefile      |  1 +
 xen/arch/x86/hvm/grant_table.c | 89 ++++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm.c              | 53 -------------------------
 3 files changed, 90 insertions(+), 53 deletions(-)
 create mode 100644 xen/arch/x86/hvm/grant_table.c

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index c394af7364..5bd38f633f 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -6,6 +6,7 @@ obj-y += dm.o
 obj-bin-y += dom0_build.init.o
 obj-y += domain.o
 obj-y += emulate.o
+obj-y += grant_table.o
 obj-y += hpet.o
 obj-y += hvm.o
 obj-y += hypercall.o
diff --git a/xen/arch/x86/hvm/grant_table.c b/xen/arch/x86/hvm/grant_table.c
new file mode 100644
index 0000000000..7503c2c61b
--- /dev/null
+++ b/xen/arch/x86/hvm/grant_table.c
@@ -0,0 +1,89 @@
+/******************************************************************************
+ * arch/x86/hvm/grant_table.c
+ *
+ * Grant table interfaces for HVM guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/types.h>
+
+#include <public/grant_table.h>
+
+#include <asm/p2m.h>
+
+int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                             unsigned int flags,
+                             unsigned int cache_flags)
+{
+    p2m_type_t p2mt;
+    int rc;
+
+    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
+        return GNTST_general_error;
+
+    if ( flags & GNTMAP_readonly )
+        p2mt = p2m_grant_map_ro;
+    else
+        p2mt = p2m_grant_map_rw;
+    rc = guest_physmap_add_entry(current->domain,
+                                 _gfn(addr >> PAGE_SHIFT),
+                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
+    if ( rc )
+        return GNTST_general_error;
+    else
+        return GNTST_okay;
+}
+
+int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
+                              uint64_t new_addr, unsigned int flags)
+{
+    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
+    p2m_type_t type;
+    mfn_t old_mfn;
+    struct domain *d = current->domain;
+
+    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
+        return GNTST_general_error;
+
+    old_mfn = get_gfn(d, gfn, &type);
+    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
+    {
+        put_gfn(d, gfn);
+        gdprintk(XENLOG_WARNING,
+                 "old mapping invalid (type %d, mfn %" PRI_mfn ", frame %lx)\n",
+                 type, mfn_x(old_mfn), frame);
+        return GNTST_general_error;
+    }
+    if ( guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K) )
+    {
+        put_gfn(d, gfn);
+        return GNTST_general_error;
+    }
+
+    put_gfn(d, gfn);
+    return GNTST_okay;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index defc2c9bcc..4e6f9f5750 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4245,29 +4245,6 @@ static int destroy_grant_va_mapping(
     return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
 }
 
-int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
-                             unsigned int flags,
-                             unsigned int cache_flags)
-{
-    p2m_type_t p2mt;
-    int rc;
-
-    if ( cache_flags  || (flags & ~GNTMAP_readonly) != GNTMAP_host_map )
-        return GNTST_general_error;
-
-    if ( flags & GNTMAP_readonly )
-        p2mt = p2m_grant_map_ro;
-    else
-        p2mt = p2m_grant_map_rw;
-    rc = guest_physmap_add_entry(current->domain,
-                                 _gfn(addr >> PAGE_SHIFT),
-                                 _mfn(frame), PAGE_ORDER_4K, p2mt);
-    if ( rc )
-        return GNTST_general_error;
-    else
-        return GNTST_okay;
-}
-
 int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
                             unsigned int flags, unsigned int cache_flags)
 {
@@ -4296,36 +4273,6 @@ int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
     return create_grant_va_mapping(addr, pte, current);
 }
 
-int replace_grant_p2m_mapping(
-    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
-{
-    unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
-    p2m_type_t type;
-    mfn_t old_mfn;
-    struct domain *d = current->domain;
-
-    if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
-        return GNTST_general_error;
-
-    old_mfn = get_gfn(d, gfn, &type);
-    if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
-    {
-        put_gfn(d, gfn);
-        gdprintk(XENLOG_WARNING,
-                 "old mapping invalid (type %d, mfn %" PRI_mfn ", frame %lx)\n",
-                 type, mfn_x(old_mfn), frame);
-        return GNTST_general_error;
-    }
-    if ( guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K) )
-    {
-        put_gfn(d, gfn);
-        return GNTST_general_error;
-    }
-
-    put_gfn(d, gfn);
-    return GNTST_okay;
-}
-
 int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
                              uint64_t new_addr, unsigned int flags)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 04/21] x86/mm: lift PAGE_CACHE_ATTRS to page.h
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (2 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 03/21] x86/mm: split HVM grant table code to hvm/grant_table.c Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 05/21] x86/mm: document the return values from get_page_from_l*e Wei Liu
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Currently all the users are within x86/mm.c. But that will change once
we split PV specific mm code to another file. Lift that to page.h
along side _PAGE_* in preparation for later patches.

No functional change. Add some spaces around "|" while moving.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c          | 2 --
 xen/include/asm-x86/page.h | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 4e6f9f5750..8d7ceff9c8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -151,8 +151,6 @@ bool __read_mostly machine_to_phys_mapping_valid;
 
 struct rangeset *__read_mostly mmio_ro_ranges;
 
-#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
-
 bool __read_mostly opt_allow_superpage;
 boolean_param("allowsuperpage", opt_allow_superpage);
 
diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
index 474b9bde78..80dca02516 100644
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -315,6 +315,8 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
 #define _PAGE_AVAIL_HIGH (_AC(0x7ff, U) << 12)
 #define _PAGE_NX       (cpu_has_nx ? _PAGE_NX_BIT : 0)
 
+#define PAGE_CACHE_ATTRS (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
+
 /*
  * Debug option: Ensure that granted mappings are not implicitly unmapped.
  * WARNING: This will need to be disabled to run OSes that use the spare PTE
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 05/21] x86/mm: document the return values from get_page_from_l*e
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (3 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 04/21] x86/mm: lift PAGE_CACHE_ATTRS to page.h Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 06/21] x86: move pv_emul_is_mem_write to pv/emulate.c Wei Liu
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8d7ceff9c8..141d1fc046 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -894,6 +894,12 @@ static int print_mmio_emul_range(unsigned long s, unsigned long e, void *arg)
 }
 #endif
 
+/*
+ * get_page_from_l1e returns:
+ *   0  => success (page not present also counts as such)
+ *  <0  => error code
+ *  >0  => the page flags to be flipped
+ */
 int
 get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner)
@@ -1106,6 +1112,12 @@ get_page_from_l1e(
 
 
 /* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
+/*
+ * get_page_from_l2e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
 define_get_linear_pagetable(l2);
 static int
 get_page_from_l2e(
@@ -1149,6 +1161,12 @@ get_page_from_l2e(
 }
 
 
+/*
+ * get_page_from_l3e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
 define_get_linear_pagetable(l3);
 static int
 get_page_from_l3e(
@@ -1176,6 +1194,12 @@ get_page_from_l3e(
     return rc;
 }
 
+/*
+ * get_page_from_l4e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
 define_get_linear_pagetable(l4);
 static int
 get_page_from_l4e(
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 06/21] x86: move pv_emul_is_mem_write to pv/emulate.c
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (4 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 05/21] x86/mm: document the return values from get_page_from_l*e Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 07/21] x86/mm: move and rename guest_get_eff{, kern}_l1e Wei Liu
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Export it via pv/emulate.h.  In the mean time it is required to
include pv/emulate.h in x86/mm.c.

The said function will be used later by different emulation handlers
in later patches.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c         | 9 ++-------
 xen/arch/x86/pv/emulate.c | 7 +++++++
 xen/arch/x86/pv/emulate.h | 3 +++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 141d1fc046..102b607c78 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -126,6 +126,8 @@
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
 
+#include "pv/emulate.h"
+
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
@@ -5366,13 +5368,6 @@ static int ptwr_emulated_cmpxchg(
         container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
 }
 
-static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
-                                struct x86_emulate_ctxt *ctxt)
-{
-    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
-                                              : X86EMUL_UNHANDLEABLE;
-}
-
 static const struct x86_emulate_ops ptwr_emulate_ops = {
     .read       = ptwr_emulated_read,
     .insn_fetch = ptwr_emulated_read,
diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
index 5750c7699b..1c4d6eab28 100644
--- a/xen/arch/x86/pv/emulate.c
+++ b/xen/arch/x86/pv/emulate.c
@@ -87,6 +87,13 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip)
     }
 }
 
+int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+                         struct x86_emulate_ctxt *ctxt)
+{
+    return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
+                                              : X86EMUL_UNHANDLEABLE;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/emulate.h b/xen/arch/x86/pv/emulate.h
index b2b1192d48..89abbe010f 100644
--- a/xen/arch/x86/pv/emulate.h
+++ b/xen/arch/x86/pv/emulate.h
@@ -7,4 +7,7 @@ int pv_emul_read_descriptor(unsigned int sel, const struct vcpu *v,
 
 void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip);
 
+int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+                         struct x86_emulate_ctxt *ctxt);
+
 #endif /* __PV_EMULATE_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 07/21] x86/mm: move and rename guest_get_eff{, kern}_l1e
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (5 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 06/21] x86: move pv_emul_is_mem_write to pv/emulate.c Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 08/21] x86/mm: export get_page_from_pagenr Wei Liu
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move them to pv/mm.c and rename them to pv_get_guest_eff_{,kern}_l1e.
Export them via pv/mm.h.

They will be used later in emulation handlers.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 38 +++----------------------
 xen/arch/x86/pv/Makefile    |  1 +
 xen/arch/x86/pv/mm.c        | 67 +++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/mm.h | 53 +++++++++++++++++++++++++++++++++++
 4 files changed, 125 insertions(+), 34 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.c
 create mode 100644 xen/include/asm-x86/pv/mm.h

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 102b607c78..d264f76684 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -125,6 +125,7 @@
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/grant_table.h>
+#include <asm/pv/mm.h>
 
 #include "pv/emulate.h"
 
@@ -577,37 +578,6 @@ static inline void guest_unmap_l1e(void *p)
     unmap_domain_page(p);
 }
 
-/* Read a PV guest's l1e that maps this virtual address. */
-static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
-{
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) ||
-         __copy_from_user(eff_l1e,
-                          &__linear_l1_table[l1_linear_offset(addr)],
-                          sizeof(l1_pgentry_t)) )
-        *eff_l1e = l1e_empty();
-}
-
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
-                                          void *eff_l1e)
-{
-    const bool user_mode = !(v->arch.flags & TF_kernel_mode);
-
-    if ( user_mode )
-        toggle_guest_mode(v);
-
-    guest_get_eff_l1e(addr, eff_l1e);
-
-    if ( user_mode )
-        toggle_guest_mode(v);
-}
-
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
     /*
@@ -692,7 +662,7 @@ int map_ldt_shadow_page(unsigned int off)
 
     if ( is_pv_32bit_domain(d) )
         gva = (u32)gva;
-    guest_get_eff_kern_l1e(v, gva, &l1e);
+    pv_get_guest_eff_kern_l1e(v, gva, &l1e);
     if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
         return 0;
 
@@ -5396,7 +5366,7 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     int rc;
 
     /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
+    pv_get_guest_eff_l1e(addr, &pte);
 
     /* We are looking only for read-only mappings of p.t. pages. */
     if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
@@ -5551,7 +5521,7 @@ int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
     int rc;
 
     /* Attempt to read the PTE that maps the VA being accessed. */
-    guest_get_eff_l1e(addr, &pte);
+    pv_get_guest_eff_l1e(addr, &pte);
 
     /* We are looking only for read-only mappings of MMIO pages. */
     if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 4e15484471..c83aed493b 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -7,6 +7,7 @@ obj-y += emul-priv-op.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
+obj-y += mm.o
 obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..aa2ce34145
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,67 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Memory management code for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+
+#include <asm/pv/mm.h>
+
+
+/* Read a PV guest's l1e that maps this virtual address. */
+void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
+{
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) ||
+         __copy_from_user(eff_l1e,
+                          &__linear_l1_table[l1_linear_offset(addr)],
+                          sizeof(l1_pgentry_t)) )
+        *eff_l1e = l1e_empty();
+}
+
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                               void *eff_l1e)
+{
+    const bool user_mode = !(v->arch.flags & TF_kernel_mode);
+
+    if ( user_mode )
+        toggle_guest_mode(v);
+
+    pv_get_guest_eff_l1e(addr, eff_l1e);
+
+    if ( user_mode )
+        toggle_guest_mode(v);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
new file mode 100644
index 0000000000..19dbc3b66c
--- /dev/null
+++ b/xen/include/asm-x86/pv/mm.h
@@ -0,0 +1,53 @@
+/*
+ * asm-x86/pv/mm.h
+ *
+ * Memory management interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_MM_H__
+#define __X86_PV_MM_H__
+
+#ifdef CONFIG_PV
+
+void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e);
+
+void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                               void *eff_l1e);
+
+#else
+
+static inline void pv_get_guest_eff_l1e(unsigned long addr,
+                                        l1_pgentry_t *eff_l1e)
+{}
+
+static inline void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+                                             void *eff_l1e)
+{}
+
+#endif
+
+#endif /* __X86_PV_MM_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 08/21] x86/mm: export get_page_from_pagenr
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (6 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 07/21] x86/mm: move and rename guest_get_eff{, kern}_l1e Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 09/21] x86/mm: rename and move update_intpte Wei Liu
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

It will be used by different files later, so export it via
asm-x86/mm.h.

Make it return 0 on success and -EINVAL on failure to match other
get_page_from functions. Fix all call sites.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 26 +++++++++++++-------------
 xen/include/asm-x86/mm.h |  1 +
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d264f76684..472f0d40d5 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -689,7 +689,7 @@ int map_ldt_shadow_page(unsigned int off)
 }
 
 
-static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
 
@@ -697,10 +697,10 @@ static int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
     {
         gdprintk(XENLOG_WARNING,
                  "Could not get page ref for mfn %"PRI_mfn"\n", page_nr);
-        return 0;
+        return -EINVAL;
     }
 
-    return 1;
+    return 0;
 }
 
 
@@ -714,7 +714,7 @@ static int get_page_and_type_from_pagenr(unsigned long page_nr,
     int rc;
 
     if ( likely(partial >= 0) &&
-         unlikely(!get_page_from_pagenr(page_nr, d)) )
+         unlikely(get_page_from_pagenr(page_nr, d)) )
         return -EINVAL;
 
     rc = (preemptible ?
@@ -768,7 +768,7 @@ get_##level##_linear_pagetable(                                             \
     if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
     {                                                                       \
         /* Make sure the mapped frame belongs to the correct domain. */     \
-        if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
+        if ( unlikely(get_page_from_pagenr(pfn, d)) )                       \
             return 0;                                                       \
                                                                             \
         /*                                                                  \
@@ -2998,7 +2998,7 @@ int new_guest_cr3(unsigned long mfn)
     }
 
     rc = paging_mode_refcounts(d)
-         ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
+         ? (!get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
          : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
     switch ( rc )
     {
@@ -3921,7 +3921,7 @@ long do_mmu_update(
                 xsm_checked = xsm_needed;
             }
 
-            if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+            if ( unlikely(get_page_from_pagenr(mfn, pg_owner)) )
             {
                 gdprintk(XENLOG_WARNING,
                          "Could not get page for mach->phys update\n");
@@ -4135,7 +4135,7 @@ static int create_grant_va_mapping(
         return GNTST_general_error;
     }
 
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
     {
         guest_unmap_l1e(pl1e);
         return GNTST_general_error;
@@ -4185,7 +4185,7 @@ static int replace_grant_va_mapping(
         return GNTST_general_error;
     }
 
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
     {
         rc = GNTST_general_error;
         goto out;
@@ -4295,7 +4295,7 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
         return GNTST_general_error;
     }
 
-    if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
     {
         guest_unmap_l1e(pl1e);
         return GNTST_general_error;
@@ -4466,7 +4466,7 @@ static int __do_update_va_mapping(
 
     rc = -EINVAL;
     pl1e = guest_map_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
+    if ( unlikely(!pl1e || get_page_from_pagenr(gl1mfn, d)) )
         goto out;
 
     gl1pg = mfn_to_page(gl1mfn);
@@ -4838,7 +4838,7 @@ int xenmem_add_to_physmap_one(
                 put_gfn(d, gfn);
                 return -ENOMEM;
             }
-            if ( !get_page_from_pagenr(idx, d) )
+            if ( get_page_from_pagenr(idx, d) )
                 break;
             mfn = idx;
             page = mfn_to_page(mfn);
@@ -5371,7 +5371,7 @@ int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
     /* We are looking only for read-only mappings of p.t. pages. */
     if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
          rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         !get_page_from_pagenr(l1e_get_pfn(pte), d) )
+         get_page_from_pagenr(l1e_get_pfn(pte), d) )
         goto bail;
 
     page = l1e_get_page(pte);
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2550e35f85..6fc1e7d5ca 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -363,6 +363,7 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(
     l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+int get_page_from_pagenr(unsigned long page_nr, struct domain *d);
 
 static inline void put_page_and_type(struct page_info *page)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 09/21] x86/mm: rename and move update_intpte
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (7 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 08/21] x86/mm: export get_page_from_pagenr Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 10/21] x86/mm: move {un, }adjust_guest_* to pv/mm.h Wei Liu
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

That function is only used by PV guests supporting code, add pv_
prefix.

Export it via pv/mm.h. Move UPDATE_ENTRY as well.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Now it is no longer an inline function, but I don't think that matters
much.
---
 xen/arch/x86/mm.c           | 65 ---------------------------------------------
 xen/arch/x86/pv/mm.c        | 54 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/mm.h | 17 ++++++++++++
 3 files changed, 71 insertions(+), 65 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 472f0d40d5..fbf3b31051 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -133,14 +133,6 @@
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
 
-/*
- * PTE updates can be done with ordinary writes except:
- *  1. Debug builds get extra checking by using CMPXCHG[8B].
- */
-#if !defined(NDEBUG)
-#define PTE_UPDATE_WITH_CMPXCHG
-#endif
-
 paddr_t __read_mostly mem_hotplug;
 
 /* Private domain structs for DOMID_XEN and DOMID_IO. */
@@ -1846,63 +1838,6 @@ void page_unlock(struct page_info *page)
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
 }
 
-/*
- * How to write an entry to the guest pagetables.
- * Returns false for failure (pointer not valid), true for success.
- */
-static inline bool update_intpte(
-    intpte_t *p, intpte_t old, intpte_t new, unsigned long mfn,
-    struct vcpu *v, int preserve_ad)
-{
-    bool rv = true;
-
-#ifndef PTE_UPDATE_WITH_CMPXCHG
-    if ( !preserve_ad )
-    {
-        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
-    }
-    else
-#endif
-    {
-        intpte_t t = old;
-
-        for ( ; ; )
-        {
-            intpte_t _new = new;
-
-            if ( preserve_ad )
-                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
-            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
-            if ( unlikely(rv == 0) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Failed to update %" PRIpte " -> %" PRIpte
-                         ": saw %" PRIpte "\n", old, _new, t);
-                break;
-            }
-
-            if ( t == old )
-                break;
-
-            /* Allowed to change in Accessed/Dirty flags only. */
-            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
-            old = t;
-        }
-    }
-    return rv;
-}
-
-/*
- * Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu
- */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                         \
-    update_intpte(&_t ## e_get_intpte(*(_p)),                       \
-                  _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
-                  (_m), (_v), (_ad))
-
 /*
  * PTE flags that a guest may change without re-validating the PTE.
  * All other bits affect translation, caching, or Xen's safety.
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index aa2ce34145..2cb5995e62 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -24,6 +24,13 @@
 
 #include <asm/pv/mm.h>
 
+/*
+ * PTE updates can be done with ordinary writes except:
+ *  1. Debug builds get extra checking by using CMPXCHG[8B].
+ */
+#if !defined(NDEBUG)
+#define PTE_UPDATE_WITH_CMPXCHG
+#endif
 
 /* Read a PV guest's l1e that maps this virtual address. */
 void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
@@ -56,6 +63,53 @@ void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
         toggle_guest_mode(v);
 }
 
+/*
+ * How to write an entry to the guest pagetables.
+ * Returns false for failure (pointer not valid), true for success.
+ */
+bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
+                      unsigned long mfn, struct vcpu *v, int preserve_ad)
+{
+    bool rv = true;
+
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+    if ( !preserve_ad )
+    {
+        rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+    }
+    else
+#endif
+    {
+        intpte_t t = old;
+
+        for ( ; ; )
+        {
+            intpte_t _new = new;
+
+            if ( preserve_ad )
+                _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+            rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+            if ( unlikely(rv == 0) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Failed to update %" PRIpte " -> %" PRIpte
+                         ": saw %" PRIpte "\n", old, _new, t);
+                break;
+            }
+
+            if ( t == old )
+                break;
+
+            /* Allowed to change in Accessed/Dirty flags only. */
+            BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+            old = t;
+        }
+    }
+    return rv;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 19dbc3b66c..ae85a9ca1a 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -21,6 +21,7 @@
 #ifndef __X86_PV_MM_H__
 #define __X86_PV_MM_H__
 
+
 #ifdef CONFIG_PV
 
 void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e);
@@ -28,6 +29,17 @@ void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e);
 void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
                                void *eff_l1e);
 
+bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
+                      unsigned long mfn, struct vcpu *v, int preserve_ad);
+/*
+ * Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu
+ */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad)                            \
+    pv_update_intpte(&_t ## e_get_intpte(*(_p)),                       \
+                     _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
+                     (_m), (_v), (_ad))
+
 #else
 
 static inline void pv_get_guest_eff_l1e(unsigned long addr,
@@ -38,6 +50,11 @@ static inline void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
                                              void *eff_l1e)
 {}
 
+static inline bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
+                                    unsigned long mfn, struct vcpu *v,
+                                    int preserve_ad)
+{ return false; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 10/21] x86/mm: move {un, }adjust_guest_* to pv/mm.h
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (8 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 09/21] x86/mm: rename and move update_intpte Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 11/21] x86/mm: split out writable pagetable emulation code Wei Liu
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Those macros will soon be used in different files. They are PV
specific so move them to pv/mm.h.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 47 ---------------------------------------------
 xen/include/asm-x86/pv/mm.h | 47 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fbf3b31051..26b0bd4212 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1189,53 +1189,6 @@ get_page_from_l4e(
     return rc;
 }
 
-#define adjust_guest_l1e(pl1e, d)                                            \
-    do {                                                                     \
-        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
-             likely(!is_pv_32bit_domain(d)) )                                \
-        {                                                                    \
-            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
-            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
-                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
-                gdprintk(XENLOG_WARNING,                                     \
-                         "Global bit is set to kernel page %lx\n",           \
-                         l1e_get_pfn((pl1e)));                               \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
-                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
-            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
-                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
-        }                                                                    \
-    } while ( 0 )
-
-#define adjust_guest_l2e(pl2e, d)                               \
-    do {                                                        \
-        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l2e_add_flags((pl2e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define adjust_guest_l3e(pl3e, d)                                   \
-    do {                                                            \
-        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
-            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
-                                         _PAGE_USER :               \
-                                         _PAGE_USER|_PAGE_RW);      \
-    } while ( 0 )
-
-#define adjust_guest_l4e(pl4e, d)                               \
-    do {                                                        \
-        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
-             likely(!is_pv_32bit_domain(d)) )                   \
-            l4e_add_flags((pl4e), _PAGE_USER);                  \
-    } while ( 0 )
-
-#define unadjust_guest_l3e(pl3e, d)                                         \
-    do {                                                                    \
-        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
-             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
-            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
-    } while ( 0 )
-
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 {
     unsigned long     pfn = l1e_get_pfn(l1e);
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index ae85a9ca1a..4931bccb29 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -24,6 +24,53 @@
 
 #ifdef CONFIG_PV
 
+#define adjust_guest_l1e(pl1e, d)                                            \
+    do {                                                                     \
+        if ( likely(l1e_get_flags((pl1e)) & _PAGE_PRESENT) &&                \
+             likely(!is_pv_32bit_domain(d)) )                                \
+        {                                                                    \
+            /* _PAGE_GUEST_KERNEL page cannot have the Global bit set. */    \
+            if ( (l1e_get_flags((pl1e)) & (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL)) \
+                 == (_PAGE_GUEST_KERNEL|_PAGE_GLOBAL) )                      \
+                gdprintk(XENLOG_WARNING,                                     \
+                         "Global bit is set to kernel page %lx\n",           \
+                         l1e_get_pfn((pl1e)));                               \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_USER) )                     \
+                l1e_add_flags((pl1e), (_PAGE_GUEST_KERNEL|_PAGE_USER));      \
+            if ( !(l1e_get_flags((pl1e)) & _PAGE_GUEST_KERNEL) )             \
+                l1e_add_flags((pl1e), (_PAGE_GLOBAL|_PAGE_USER));            \
+        }                                                                    \
+    } while ( 0 )
+
+#define adjust_guest_l2e(pl2e, d)                               \
+    do {                                                        \
+        if ( likely(l2e_get_flags((pl2e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l2e_add_flags((pl2e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define adjust_guest_l3e(pl3e, d)                                   \
+    do {                                                            \
+        if ( likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )        \
+            l3e_add_flags((pl3e), likely(!is_pv_32bit_domain(d)) ?  \
+                                         _PAGE_USER :               \
+                                         _PAGE_USER|_PAGE_RW);      \
+    } while ( 0 )
+
+#define adjust_guest_l4e(pl4e, d)                               \
+    do {                                                        \
+        if ( likely(l4e_get_flags((pl4e)) & _PAGE_PRESENT) &&   \
+             likely(!is_pv_32bit_domain(d)) )                   \
+            l4e_add_flags((pl4e), _PAGE_USER);                  \
+    } while ( 0 )
+
+#define unadjust_guest_l3e(pl3e, d)                                         \
+    do {                                                                    \
+        if ( unlikely(is_pv_32bit_domain(d)) &&                             \
+             likely(l3e_get_flags((pl3e)) & _PAGE_PRESENT) )                \
+            l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
+    } while ( 0 )
+
 void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e);
 
 void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 11/21] x86/mm: split out writable pagetable emulation code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (9 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 10/21] x86/mm: move {un, }adjust_guest_* to pv/mm.h Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 12/21] x86/mm: split out readonly MMIO " Wei Liu
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/emul-ptwr-op.c. Fix coding style issues while
moving the code.

Rename ptwr_emulated_read to pv_emul_ptwr_read and export it via
pv/mm.h because it is needed by other emulation code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c              | 308 +-------------------------------------
 xen/arch/x86/pv/Makefile       |   1 +
 xen/arch/x86/pv/emul-ptwr-op.c | 327 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/emulate.h      |   2 +
 4 files changed, 332 insertions(+), 306 deletions(-)
 create mode 100644 xen/arch/x86/pv/emul-ptwr-op.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 26b0bd4212..548780aba6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5014,310 +5014,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 }
 
 
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
-    struct x86_emulate_ctxt ctxt;
-    unsigned long cr2;
-    l1_pgentry_t  pte;
-};
-
-static int ptwr_emulated_read(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    unsigned int rc = bytes;
-    unsigned long addr = offset;
-
-    if ( !__addr_ok(addr) ||
-         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
-    {
-        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
-        return X86EMUL_EXCEPTION;
-    }
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
-    unsigned long addr,
-    paddr_t old,
-    paddr_t val,
-    unsigned int bytes,
-    unsigned int do_cmpxchg,
-    struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
-    unsigned long mfn;
-    unsigned long unaligned_addr = addr;
-    struct page_info *page;
-    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    int ret;
-
-    /* Only allow naturally-aligned stores within the original %cr2 page. */
-    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
-                  (addr & (bytes - 1))) )
-    {
-        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
-                 ptwr_ctxt->cr2, addr, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    /* Turn a sub-word access into a full-word access. */
-    if ( bytes != sizeof(paddr_t) )
-    {
-        paddr_t      full;
-        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
-
-        /* Align address; read full word. */
-        addr &= ~(sizeof(paddr_t) - 1);
-        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
-        {
-            x86_emul_pagefault(0, /* Read fault. */
-                               addr + sizeof(paddr_t) - rc,
-                               &ptwr_ctxt->ctxt);
-            return X86EMUL_EXCEPTION;
-        }
-        /* Mask out bits provided by caller. */
-        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
-        /* Shift the caller value and OR in the missing bits. */
-        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        val <<= (offset) * 8;
-        val  |= full;
-        /* Also fill in missing parts of the cmpxchg old value. */
-        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
-        old <<= (offset) * 8;
-        old  |= full;
-    }
-
-    pte  = ptwr_ctxt->pte;
-    mfn  = l1e_get_pfn(pte);
-    page = mfn_to_page(mfn);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
-    ASSERT(mfn_valid(_mfn(mfn)));
-    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
-    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
-    ASSERT(page_get_owner(page) == d);
-
-    /* Check the new PTE. */
-    nl1e = l1e_from_intpte(val);
-    switch ( ret = get_page_from_l1e(nl1e, d, d) )
-    {
-    default:
-        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
-             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
-        {
-            /*
-             * If this is an upper-half write to a PAE PTE then we assume that
-             * the guest has simply got the two writes the wrong way round. We
-             * zap the PRESENT bit on the assumption that the bottom half will
-             * be written immediately after we return to the guest.
-             */
-            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
-                     PRIpte"\n", l1e_get_intpte(nl1e));
-            l1e_remove_flags(nl1e, _PAGE_PRESENT);
-        }
-        else
-        {
-            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
-            return X86EMUL_UNHANDLEABLE;
-        }
-        break;
-    case 0:
-        break;
-    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-        l1e_flip_flags(nl1e, ret);
-        break;
-    }
-
-    adjust_guest_l1e(nl1e, d);
-
-    /* Checked successfully: do the update (write or cmpxchg). */
-    pl1e = map_domain_page(_mfn(mfn));
-    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
-    if ( do_cmpxchg )
-    {
-        bool okay;
-        intpte_t t = old;
-
-        ol1e = l1e_from_intpte(old);
-        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
-                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
-        okay = (okay && t == old);
-
-        if ( !okay )
-        {
-            unmap_domain_page(pl1e);
-            put_page_from_l1e(nl1e, d);
-            return X86EMUL_RETRY;
-        }
-    }
-    else
-    {
-        ol1e = *pl1e;
-        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
-            BUG();
-    }
-
-    trace_ptwr_emulation(addr, nl1e);
-
-    unmap_domain_page(pl1e);
-
-    /* Finally, drop the old PTE. */
-    put_page_from_l1e(ol1e, d);
-
-    return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t val = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
-    {
-        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&val, p_data, bytes);
-
-    return ptwr_emulated_update(
-        offset, 0, val, bytes, 0,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_old,
-    void *p_new,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    paddr_t old = 0, new = 0;
-
-    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
-    {
-        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
-                 offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    memcpy(&old, p_old, bytes);
-    memcpy(&new, p_new, bytes);
-
-    return ptwr_emulated_update(
-        offset, old, new, bytes, 1,
-        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
-    .read       = ptwr_emulated_read,
-    .insn_fetch = ptwr_emulated_read,
-    .write      = ptwr_emulated_write,
-    .cmpxchg    = ptwr_emulated_cmpxchg,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
-                       struct cpu_user_regs *regs)
-{
-    struct domain *d = v->domain;
-    struct page_info *page;
-    l1_pgentry_t      pte;
-    struct ptwr_emulate_ctxt ptwr_ctxt = {
-        .ctxt = {
-            .regs = regs,
-            .vendor = d->arch.cpuid->x86_vendor,
-            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
-            .lma       = !is_pv_32bit_domain(d),
-        },
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    pv_get_guest_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of p.t. pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
-         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
-         get_page_from_pagenr(l1e_get_pfn(pte), d) )
-        goto bail;
-
-    page = l1e_get_page(pte);
-    if ( !page_lock(page) )
-    {
-        put_page(page);
-        goto bail;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        put_page(page);
-        goto bail;
-    }
-
-    ptwr_ctxt.cr2 = addr;
-    ptwr_ctxt.pte = pte;
-
-    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
-    page_unlock(page);
-    put_page(page);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to pagetables which are marked
-         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
-         * update has succeeded on a different vcpu).  Anything else is an
-         * emulation bug, or a guest playing with the instruction stream under
-         * Xen's feet.
-         */
-        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ptwr_ctxt.ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ptwr_ctxt.ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
- bail:
-    return 0;
-}
-
 /*************************
  * fault handling for read-only MMIO pages
  */
@@ -5345,7 +5041,7 @@ int mmio_ro_emulated_write(
 
 static const struct x86_emulate_ops mmio_ro_emulate_ops = {
     .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
+    .insn_fetch = pv_emul_ptwr_read,
     .write      = mmio_ro_emulated_write,
     .validate   = pv_emul_is_mem_write,
     .cpuid      = pv_emul_cpuid,
@@ -5384,7 +5080,7 @@ int mmcfg_intercept_write(
 
 static const struct x86_emulate_ops mmcfg_intercept_ops = {
     .read       = x86emul_unhandleable_rw,
-    .insn_fetch = ptwr_emulated_read,
+    .insn_fetch = pv_emul_ptwr_read,
     .write      = mmcfg_intercept_write,
     .validate   = pv_emul_is_mem_write,
     .cpuid      = pv_emul_cpuid,
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index c83aed493b..cbd890c5f2 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -4,6 +4,7 @@ obj-y += emulate.o
 obj-y += emul-gate-op.o
 obj-y += emul-inv-op.o
 obj-y += emul-priv-op.o
+obj-y += emul-ptwr-op.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
diff --git a/xen/arch/x86/pv/emul-ptwr-op.c b/xen/arch/x86/pv/emul-ptwr-op.c
new file mode 100644
index 0000000000..1064bb63f5
--- /dev/null
+++ b/xen/arch/x86/pv/emul-ptwr-op.c
@@ -0,0 +1,327 @@
+/******************************************************************************
+ * arch/x86/pv/emul-ptwr-op.c
+ *
+ * Emulate writable pagetable for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/trace.h>
+
+#include <asm/pv/mm.h>
+
+#include "emulate.h"
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+    struct x86_emulate_ctxt ctxt;
+    unsigned long cr2;
+    l1_pgentry_t  pte;
+};
+
+int pv_emul_ptwr_read(enum x86_segment seg, unsigned long offset, void *p_data,
+                      unsigned int bytes, struct x86_emulate_ctxt *ctxt)
+{
+    unsigned int rc = bytes;
+    unsigned long addr = offset;
+
+    if ( !__addr_ok(addr) ||
+         (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+    {
+        x86_emul_pagefault(0, addr + bytes - rc, ctxt);  /* Read fault. */
+        return X86EMUL_EXCEPTION;
+    }
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(unsigned long addr, paddr_t old, paddr_t val,
+                                unsigned int bytes, unsigned int do_cmpxchg,
+                                struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+    unsigned long mfn;
+    unsigned long unaligned_addr = addr;
+    struct page_info *page;
+    l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+    struct vcpu *v = current;
+    struct domain *d = v->domain;
+    int ret;
+
+    /* Only allow naturally-aligned stores within the original %cr2 page. */
+    if ( unlikely(((addr ^ ptwr_ctxt->cr2) & PAGE_MASK) ||
+                  (addr & (bytes - 1))) )
+    {
+        gdprintk(XENLOG_WARNING, "bad access (cr2=%lx, addr=%lx, bytes=%u)\n",
+                 ptwr_ctxt->cr2, addr, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    /* Turn a sub-word access into a full-word access. */
+    if ( bytes != sizeof(paddr_t) )
+    {
+        paddr_t      full;
+        unsigned int rc, offset = addr & (sizeof(paddr_t) - 1);
+
+        /* Align address; read full word. */
+        addr &= ~(sizeof(paddr_t) - 1);
+        if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+        {
+            x86_emul_pagefault(0, /* Read fault. */
+                               addr + sizeof(paddr_t) - rc,
+                               &ptwr_ctxt->ctxt);
+            return X86EMUL_EXCEPTION;
+        }
+        /* Mask out bits provided by caller. */
+        full &= ~((((paddr_t)1 << (bytes * 8)) - 1) << (offset * 8));
+        /* Shift the caller value and OR in the missing bits. */
+        val  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        val <<= (offset) * 8;
+        val  |= full;
+        /* Also fill in missing parts of the cmpxchg old value. */
+        old  &= (((paddr_t)1 << (bytes * 8)) - 1);
+        old <<= (offset) * 8;
+        old  |= full;
+    }
+
+    pte  = ptwr_ctxt->pte;
+    mfn  = l1e_get_pfn(pte);
+    page = mfn_to_page(mfn);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+    ASSERT(mfn_valid(_mfn(mfn)));
+    ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+    ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+    ASSERT(page_get_owner(page) == d);
+
+    /* Check the new PTE. */
+    nl1e = l1e_from_intpte(val);
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
+    {
+    default:
+        if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+             !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+        {
+            /*
+             * If this is an upper-half write to a PAE PTE then we assume that
+             * the guest has simply got the two writes the wrong way round. We
+             * zap the PRESENT bit on the assumption that the bottom half will
+             * be written immediately after we return to the guest.
+             */
+            gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+                     PRIpte"\n", l1e_get_intpte(nl1e));
+            l1e_remove_flags(nl1e, _PAGE_PRESENT);
+        }
+        else
+        {
+            gdprintk(XENLOG_WARNING, "could not get_page_from_l1e()\n");
+            return X86EMUL_UNHANDLEABLE;
+        }
+        break;
+    case 0:
+        break;
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
+        break;
+    }
+
+    adjust_guest_l1e(nl1e, d);
+
+    /* Checked successfully: do the update (write or cmpxchg). */
+    pl1e = map_domain_page(_mfn(mfn));
+    pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+    if ( do_cmpxchg )
+    {
+        bool okay;
+        intpte_t t = old;
+
+        ol1e = l1e_from_intpte(old);
+        okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+                                          &t, l1e_get_intpte(nl1e), _mfn(mfn));
+        okay = (okay && t == old);
+
+        if ( !okay )
+        {
+            unmap_domain_page(pl1e);
+            put_page_from_l1e(nl1e, d);
+            return X86EMUL_RETRY;
+        }
+    }
+    else
+    {
+        ol1e = *pl1e;
+        if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+            BUG();
+    }
+
+    trace_ptwr_emulation(addr, nl1e);
+
+    unmap_domain_page(pl1e);
+
+    /* Finally, drop the old PTE. */
+    put_page_from_l1e(ol1e, d);
+
+    return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(enum x86_segment seg, unsigned long offset,
+                               void *p_data, unsigned int bytes,
+                               struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t val = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+    {
+        gdprintk(XENLOG_WARNING, "bad write size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&val, p_data, bytes);
+
+    return ptwr_emulated_update(
+        offset, 0, val, bytes, 0,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(enum x86_segment seg, unsigned long offset,
+                                 void *p_old, void *p_new, unsigned int bytes,
+                                 struct x86_emulate_ctxt *ctxt)
+{
+    paddr_t old = 0, new = 0;
+
+    if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) )
+    {
+        gdprintk(XENLOG_WARNING, "bad cmpxchg size (addr=%lx, bytes=%u)\n",
+                 offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    memcpy(&old, p_old, bytes);
+    memcpy(&new, p_new, bytes);
+
+    return ptwr_emulated_update(
+        offset, old, new, bytes, 1,
+        container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+    .read       = pv_emul_ptwr_read,
+    .insn_fetch = pv_emul_ptwr_read,
+    .write      = ptwr_emulated_write,
+    .cmpxchg    = ptwr_emulated_cmpxchg,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+                       struct cpu_user_regs *regs)
+{
+    struct domain *d = v->domain;
+    struct page_info *page;
+    l1_pgentry_t      pte;
+    struct ptwr_emulate_ctxt ptwr_ctxt = {
+        .ctxt = {
+            .regs = regs,
+            .vendor = d->arch.cpuid->x86_vendor,
+            .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .sp_size   = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+            .lma       = !is_pv_32bit_domain(d),
+        },
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    pv_get_guest_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of p.t. pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+         rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+         get_page_from_pagenr(l1e_get_pfn(pte), d) )
+        goto bail;
+
+    page = l1e_get_page(pte);
+    if ( !page_lock(page) )
+    {
+        put_page(page);
+        goto bail;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        put_page(page);
+        goto bail;
+    }
+
+    ptwr_ctxt.cr2 = addr;
+    ptwr_ctxt.pte = pte;
+
+    rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+    page_unlock(page);
+    put_page(page);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to pagetables which are marked
+         * read-only by Xen.  We tolerate #PF (in case a concurrent pagetable
+         * update has succeeded on a different vcpu).  Anything else is an
+         * emulation bug, or a guest playing with the instruction stream under
+         * Xen's feet.
+         */
+        if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ptwr_ctxt.ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ptwr_ctxt.ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+ bail:
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/pv/emulate.h b/xen/arch/x86/pv/emulate.h
index 89abbe010f..7fb568adc0 100644
--- a/xen/arch/x86/pv/emulate.h
+++ b/xen/arch/x86/pv/emulate.h
@@ -10,4 +10,6 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip);
 int pv_emul_is_mem_write(const struct x86_emulate_state *state,
                          struct x86_emulate_ctxt *ctxt);
 
+int pv_emul_ptwr_read(enum x86_segment seg, unsigned long offset, void *p_data,
+                      unsigned int bytes, struct x86_emulate_ctxt *ctxt);
 #endif /* __PV_EMULATE_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 12/21] x86/mm: split out readonly MMIO emulation code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (10 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 11/21] x86/mm: split out writable pagetable emulation code Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 13/21] x86/mm: remove the unused inclusion of pv/emulate.h Wei Liu
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/emul-mmio-op.c. Fix coding style issues while
moving.

Note that mmio_ro_emulated_write is needed by both PV and HVM, so it
is left in x86/mm.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c              | 129 --------------------------------
 xen/arch/x86/pv/Makefile       |   1 +
 xen/arch/x86/pv/emul-mmio-op.c | 166 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 167 insertions(+), 129 deletions(-)
 create mode 100644 xen/arch/x86/pv/emul-mmio-op.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 548780aba6..26ad4a2e3b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5013,11 +5013,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     return 0;
 }
 
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
 int mmio_ro_emulated_write(
     enum x86_segment seg,
     unsigned long offset,
@@ -5039,130 +5034,6 @@ int mmio_ro_emulated_write(
     return X86EMUL_OKAY;
 }
 
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = pv_emul_ptwr_read,
-    .write      = mmio_ro_emulated_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-int mmcfg_intercept_write(
-    enum x86_segment seg,
-    unsigned long offset,
-    void *p_data,
-    unsigned int bytes,
-    struct x86_emulate_ctxt *ctxt)
-{
-    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
-    /*
-     * Only allow naturally-aligned stores no wider than 4 bytes to the
-     * original %cr2 address.
-     */
-    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
-         offset != mmio_ctxt->cr2 )
-    {
-        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
-                mmio_ctxt->cr2, offset, bytes);
-        return X86EMUL_UNHANDLEABLE;
-    }
-
-    offset &= 0xfff;
-    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
-                                  offset, bytes, p_data) >= 0 )
-        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
-                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
-                        *(uint32_t *)p_data);
-
-    return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
-    .read       = x86emul_unhandleable_rw,
-    .insn_fetch = pv_emul_ptwr_read,
-    .write      = mmcfg_intercept_write,
-    .validate   = pv_emul_is_mem_write,
-    .cpuid      = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
-                          struct cpu_user_regs *regs)
-{
-    l1_pgentry_t pte;
-    unsigned long mfn;
-    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
-    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
-    struct x86_emulate_ctxt ctxt = {
-        .regs = regs,
-        .vendor = v->domain->arch.cpuid->x86_vendor,
-        .addr_size = addr_size,
-        .sp_size = addr_size,
-        .lma = !is_pv_32bit_vcpu(v),
-        .data = &mmio_ro_ctxt,
-    };
-    int rc;
-
-    /* Attempt to read the PTE that maps the VA being accessed. */
-    pv_get_guest_eff_l1e(addr, &pte);
-
-    /* We are looking only for read-only mappings of MMIO pages. */
-    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
-        return 0;
-
-    mfn = l1e_get_pfn(pte);
-    if ( mfn_valid(_mfn(mfn)) )
-    {
-        struct page_info *page = mfn_to_page(mfn);
-        struct domain *owner = page_get_owner_and_reference(page);
-
-        if ( owner )
-            put_page(page);
-        if ( owner != dom_io )
-            return 0;
-    }
-
-    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-        return 0;
-
-    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
-        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
-    else
-        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
-
-    switch ( rc )
-    {
-    case X86EMUL_EXCEPTION:
-        /*
-         * This emulation only covers writes to MMCFG space or read-only MFNs.
-         * We tolerate #PF (from hitting an adjacent page or a successful
-         * concurrent pagetable update).  Anything else is an emulation bug,
-         * or a guest playing with the instruction stream under Xen's feet.
-         */
-        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
-             ctxt.event.vector == TRAP_page_fault )
-            pv_inject_event(&ctxt.event);
-        else
-            gdprintk(XENLOG_WARNING,
-                     "Unexpected event (type %u, vector %#x) from emulation\n",
-                     ctxt.event.type, ctxt.event.vector);
-
-        /* Fallthrough */
-    case X86EMUL_OKAY:
-
-        if ( ctxt.retire.singlestep )
-            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
-        /* Fallthrough */
-    case X86EMUL_RETRY:
-        perfc_incr(ptwr_emulations);
-        return EXCRET_fault_fixed;
-    }
-
-    return 0;
-}
-
 void *alloc_xen_pagetable(void)
 {
     if ( system_state != SYS_STATE_early_boot )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index cbd890c5f2..016b1b6e8f 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -3,6 +3,7 @@ obj-y += domain.o
 obj-y += emulate.o
 obj-y += emul-gate-op.o
 obj-y += emul-inv-op.o
+obj-y += emul-mmio-op.o
 obj-y += emul-priv-op.o
 obj-y += emul-ptwr-op.o
 obj-y += hypercall.o
diff --git a/xen/arch/x86/pv/emul-mmio-op.c b/xen/arch/x86/pv/emul-mmio-op.c
new file mode 100644
index 0000000000..ee5c684777
--- /dev/null
+++ b/xen/arch/x86/pv/emul-mmio-op.c
@@ -0,0 +1,166 @@
+/******************************************************************************
+ * arch/x86/emul-mmio-op.c
+ *
+ * Readonly MMIO emulation for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/rangeset.h>
+#include <xen/sched.h>
+
+#include <asm/domain.h>
+#include <asm/mm.h>
+#include <asm/pci.h>
+#include <asm/pv/mm.h>
+
+#include "emulate.h"
+
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = pv_emul_ptwr_read,
+    .write      = mmio_ro_emulated_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+int mmcfg_intercept_write(enum x86_segment seg, unsigned long offset,
+                          void *p_data, unsigned int bytes,
+                          struct x86_emulate_ctxt *ctxt)
+{
+    struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+    /*
+     * Only allow naturally-aligned stores no wider than 4 bytes to the
+     * original %cr2 address.
+     */
+    if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+         offset != mmio_ctxt->cr2 )
+    {
+        gdprintk(XENLOG_WARNING, "bad write (cr2=%lx, addr=%lx, bytes=%u)\n",
+                mmio_ctxt->cr2, offset, bytes);
+        return X86EMUL_UNHANDLEABLE;
+    }
+
+    offset &= 0xfff;
+    if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+                                  offset, bytes, p_data) >= 0 )
+        pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+                        PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
+                        *(uint32_t *)p_data);
+
+    return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+    .read       = x86emul_unhandleable_rw,
+    .insn_fetch = pv_emul_ptwr_read,
+    .write      = mmcfg_intercept_write,
+    .validate   = pv_emul_is_mem_write,
+    .cpuid      = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+                          struct cpu_user_regs *regs)
+{
+    l1_pgentry_t pte;
+    unsigned long mfn;
+    unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
+    struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+    struct x86_emulate_ctxt ctxt = {
+        .regs = regs,
+        .vendor = v->domain->arch.cpuid->x86_vendor,
+        .addr_size = addr_size,
+        .sp_size = addr_size,
+        .lma = !is_pv_32bit_vcpu(v),
+        .data = &mmio_ro_ctxt,
+    };
+    int rc;
+
+    /* Attempt to read the PTE that maps the VA being accessed. */
+    pv_get_guest_eff_l1e(addr, &pte);
+
+    /* We are looking only for read-only mappings of MMIO pages. */
+    if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
+        return 0;
+
+    mfn = l1e_get_pfn(pte);
+    if ( mfn_valid(_mfn(mfn)) )
+    {
+        struct page_info *page = mfn_to_page(mfn);
+        struct domain *owner = page_get_owner_and_reference(page);
+
+        if ( owner )
+            put_page(page);
+        if ( owner != dom_io )
+            return 0;
+    }
+
+    if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        return 0;
+
+    if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+        rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
+    else
+        rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
+
+    switch ( rc )
+    {
+    case X86EMUL_EXCEPTION:
+        /*
+         * This emulation only covers writes to MMCFG space or read-only MFNs.
+         * We tolerate #PF (from hitting an adjacent page or a successful
+         * concurrent pagetable update).  Anything else is an emulation bug,
+         * or a guest playing with the instruction stream under Xen's feet.
+         */
+        if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+             ctxt.event.vector == TRAP_page_fault )
+            pv_inject_event(&ctxt.event);
+        else
+            gdprintk(XENLOG_WARNING,
+                     "Unexpected event (type %u, vector %#x) from emulation\n",
+                     ctxt.event.type, ctxt.event.vector);
+
+        /* Fallthrough */
+    case X86EMUL_OKAY:
+
+        if ( ctxt.retire.singlestep )
+            pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+        /* Fallthrough */
+    case X86EMUL_RETRY:
+        perfc_incr(ptwr_emulations);
+        return EXCRET_fault_fixed;
+    }
+
+    return 0;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 13/21] x86/mm: remove the unused inclusion of pv/emulate.h
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (11 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 12/21] x86/mm: split out readonly MMIO " Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 14/21] x86/mm: move and rename guest_{, un}map_l1e Wei Liu
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

All emulation code is moved by now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 26ad4a2e3b..c0a6ecc5b6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
-#include "pv/emulate.h"
-
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 14/21] x86/mm: move and rename guest_{, un}map_l1e
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (12 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 13/21] x86/mm: remove the unused inclusion of pv/emulate.h Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 15/21] x86/mm: split out PV grant table code Wei Liu
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move them to pv/mm.c and rename them pv_{,un}map_guest_l1e.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 63 +++++++++++----------------------------------
 xen/arch/x86/pv/mm.c        | 33 ++++++++++++++++++++++++
 xen/include/asm-x86/pv/mm.h |  9 +++++++
 3 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c0a6ecc5b6..5a9cc7173a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -535,39 +535,6 @@ void update_cr3(struct vcpu *v)
     make_cr3(v, cr3_mfn);
 }
 
-/* Get a mapping of a PV guest's l1e for this virtual address. */
-static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
-{
-    l2_pgentry_t l2e;
-
-    ASSERT(!paging_mode_translate(current->domain));
-    ASSERT(!paging_mode_external(current->domain));
-
-    if ( unlikely(!__addr_ok(addr)) )
-        return NULL;
-
-    /* Find this l1e and its enclosing l1mfn in the linear map. */
-    if ( __copy_from_user(&l2e,
-                          &__linear_l2_table[l2_linear_offset(addr)],
-                          sizeof(l2_pgentry_t)) )
-        return NULL;
-
-    /* Check flags that it will be safe to read the l1e. */
-    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
-        return NULL;
-
-    *gl1mfn = l2e_get_pfn(l2e);
-
-    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
-           l1_table_offset(addr);
-}
-
-/* Pull down the mapping we got from guest_map_l1e(). */
-static inline void guest_unmap_l1e(void *p)
-{
-    unmap_domain_page(p);
-}
-
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
     /*
@@ -4014,7 +3981,7 @@ static int create_grant_va_mapping(
 
     adjust_guest_l1e(nl1e, d);
 
-    pl1e = guest_map_l1e(va, &gl1mfn);
+    pl1e = pv_map_guest_l1e(va, &gl1mfn);
     if ( !pl1e )
     {
         gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va);
@@ -4023,7 +3990,7 @@ static int create_grant_va_mapping(
 
     if ( get_page_from_pagenr(gl1mfn, current->domain) )
     {
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4031,7 +3998,7 @@ static int create_grant_va_mapping(
     if ( !page_lock(l1pg) )
     {
         put_page(l1pg);
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4039,7 +4006,7 @@ static int create_grant_va_mapping(
     {
         page_unlock(l1pg);
         put_page(l1pg);
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4048,7 +4015,7 @@ static int create_grant_va_mapping(
 
     page_unlock(l1pg);
     put_page(l1pg);
-    guest_unmap_l1e(pl1e);
+    pv_unmap_guest_l1e(pl1e);
 
     if ( okay )
         put_page_from_l1e(ol1e, d);
@@ -4064,7 +4031,7 @@ static int replace_grant_va_mapping(
     struct page_info *l1pg;
     int rc = 0;
 
-    pl1e = guest_map_l1e(addr, &gl1mfn);
+    pl1e = pv_map_guest_l1e(addr, &gl1mfn);
     if ( !pl1e )
     {
         gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr);
@@ -4115,7 +4082,7 @@ static int replace_grant_va_mapping(
     page_unlock(l1pg);
     put_page(l1pg);
  out:
-    guest_unmap_l1e(pl1e);
+    pv_unmap_guest_l1e(pl1e);
     return rc;
 }
 
@@ -4173,7 +4140,7 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
     if ( !new_addr )
         return destroy_grant_va_mapping(addr, frame, curr);
 
-    pl1e = guest_map_l1e(new_addr, &gl1mfn);
+    pl1e = pv_map_guest_l1e(new_addr, &gl1mfn);
     if ( !pl1e )
     {
         gdprintk(XENLOG_WARNING,
@@ -4183,7 +4150,7 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
 
     if ( get_page_from_pagenr(gl1mfn, current->domain) )
     {
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4191,7 +4158,7 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
     if ( !page_lock(l1pg) )
     {
         put_page(l1pg);
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4199,7 +4166,7 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
     {
         page_unlock(l1pg);
         put_page(l1pg);
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
@@ -4211,13 +4178,13 @@ int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
         page_unlock(l1pg);
         put_page(l1pg);
         gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
         return GNTST_general_error;
     }
 
     page_unlock(l1pg);
     put_page(l1pg);
-    guest_unmap_l1e(pl1e);
+    pv_unmap_guest_l1e(pl1e);
 
     rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
     if ( rc )
@@ -4351,7 +4318,7 @@ static int __do_update_va_mapping(
         return rc;
 
     rc = -EINVAL;
-    pl1e = guest_map_l1e(va, &gl1mfn);
+    pl1e = pv_map_guest_l1e(va, &gl1mfn);
     if ( unlikely(!pl1e || get_page_from_pagenr(gl1mfn, d)) )
         goto out;
 
@@ -4376,7 +4343,7 @@ static int __do_update_va_mapping(
 
  out:
     if ( pl1e )
-        guest_unmap_l1e(pl1e);
+        pv_unmap_guest_l1e(pl1e);
 
     switch ( flags & UVMF_FLUSHTYPE_MASK )
     {
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 2cb5995e62..32e73d59df 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -63,6 +63,39 @@ void pv_get_guest_eff_kern_l1e(struct vcpu *v, unsigned long addr,
         toggle_guest_mode(v);
 }
 
+/* Get a mapping of a PV guest's l1e for this virtual address. */
+l1_pgentry_t *pv_map_guest_l1e(unsigned long addr, unsigned long *gl1mfn)
+{
+    l2_pgentry_t l2e;
+
+    ASSERT(!paging_mode_translate(current->domain));
+    ASSERT(!paging_mode_external(current->domain));
+
+    if ( unlikely(!__addr_ok(addr)) )
+        return NULL;
+
+    /* Find this l1e and its enclosing l1mfn in the linear map. */
+    if ( __copy_from_user(&l2e,
+                          &__linear_l2_table[l2_linear_offset(addr)],
+                          sizeof(l2_pgentry_t)) )
+        return NULL;
+
+    /* Check flags that it will be safe to read the l1e. */
+    if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+        return NULL;
+
+    *gl1mfn = l2e_get_pfn(l2e);
+
+    return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
+           l1_table_offset(addr);
+}
+
+/* Pull down the mapping we got from pv_map_guest_l1e(). */
+void pv_unmap_guest_l1e(void *p)
+{
+    unmap_domain_page(p);
+}
+
 /*
  * How to write an entry to the guest pagetables.
  * Returns false for failure (pointer not valid), true for success.
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 4931bccb29..a71ce934fa 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -87,6 +87,9 @@ bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
                      _t ## e_get_intpte(_o), _t ## e_get_intpte(_n),   \
                      (_m), (_v), (_ad))
 
+l1_pgentry_t *pv_map_guest_l1e(unsigned long addr, unsigned long *gl1mfn);
+void pv_unmap_guest_l1e(void *p);
+
 #else
 
 static inline void pv_get_guest_eff_l1e(unsigned long addr,
@@ -102,6 +105,12 @@ static inline bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
                                     int preserve_ad)
 { return false; }
 
+static inline l1_pgentry_t *pv_map_guest_l1e(unsigned long addr,
+                                             unsigned long *gl1mfn);
+{ return NULL; }
+
+static inline void pv_unmap_guest_l1e(void *p) {}
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 15/21] x86/mm: split out PV grant table code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (13 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 14/21] x86/mm: move and rename guest_{, un}map_l1e Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 16/21] x86/mm: split out descriptor " Wei Liu
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c             | 349 --------------------------------------
 xen/arch/x86/pv/Makefile      |   1 +
 xen/arch/x86/pv/grant_table.c | 386 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 387 insertions(+), 349 deletions(-)
 create mode 100644 xen/arch/x86/pv/grant_table.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5a9cc7173a..897db4cfb9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3844,355 +3844,6 @@ long do_mmu_update(
 }
 
 
-static int create_grant_pte_mapping(
-    uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-    struct domain *d = v->domain;
-
-    adjust_guest_l1e(nl1e, d);
-
-    gmfn = pte_addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
-
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    ol1e = *(l1_pgentry_t *)va;
-    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    page_unlock(page);
-
-    put_page_from_l1e(ol1e, d);
-
- failed:
-    unmap_domain_page(va);
-    put_page(page);
-
-    return rc;
-}
-
-static int destroy_grant_pte_mapping(
-    uint64_t addr, unsigned long frame, struct domain *d)
-{
-    int rc = GNTST_okay;
-    void *va;
-    unsigned long gmfn, mfn;
-    struct page_info *page;
-    l1_pgentry_t ol1e;
-
-    gmfn = addr >> PAGE_SHIFT;
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
-    if ( unlikely(!page) )
-    {
-        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
-        return GNTST_general_error;
-    }
-
-    mfn = page_to_mfn(page);
-    va = map_domain_page(_mfn(mfn));
-    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
-
-    if ( !page_lock(page) )
-    {
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(page);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    ol1e = *(l1_pgentry_t *)va;
-
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
-    {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n",
-                 l1e_get_intpte(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1,
-                                (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
-                                d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
-                                0)) )
-    {
-        page_unlock(page);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
-        rc = GNTST_general_error;
-        goto failed;
-    }
-
-    page_unlock(page);
-
- failed:
-    unmap_domain_page(va);
-    put_page(page);
-    return rc;
-}
-
-
-static int create_grant_va_mapping(
-    unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    l1_pgentry_t *pl1e, ol1e;
-    struct domain *d = v->domain;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int okay;
-
-    adjust_guest_l1e(nl1e, d);
-
-    pl1e = pv_map_guest_l1e(va, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va);
-        return GNTST_general_error;
-    }
-
-    if ( get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
-
-    page_unlock(l1pg);
-    put_page(l1pg);
-    pv_unmap_guest_l1e(pl1e);
-
-    if ( okay )
-        put_page_from_l1e(ol1e, d);
-
-    return okay ? GNTST_okay : GNTST_general_error;
-}
-
-static int replace_grant_va_mapping(
-    unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
-{
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc = 0;
-
-    pl1e = pv_map_guest_l1e(addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr);
-        return GNTST_general_error;
-    }
-
-    if ( get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        rc = GNTST_general_error;
-        goto out;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        rc = GNTST_general_error;
-        put_page(l1pg);
-        goto out;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
-    ol1e = *pl1e;
-
-    /* Check that the virtual address supplied is actually mapped to frame. */
-    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
-                 l1e_get_pfn(ol1e), addr, frame);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
-    /* Delete pagetable entry. */
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
-    {
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        rc = GNTST_general_error;
-        goto unlock_and_out;
-    }
-
- unlock_and_out:
-    page_unlock(l1pg);
-    put_page(l1pg);
- out:
-    pv_unmap_guest_l1e(pl1e);
-    return rc;
-}
-
-static int destroy_grant_va_mapping(
-    unsigned long addr, unsigned long frame, struct vcpu *v)
-{
-    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
-}
-
-int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                            unsigned int flags, unsigned int cache_flags)
-{
-    l1_pgentry_t pte;
-    uint32_t grant_pte_flags;
-
-    grant_pte_flags =
-        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
-    if ( cpu_has_nx )
-        grant_pte_flags |= _PAGE_NX_BIT;
-
-    pte = l1e_from_pfn(frame, grant_pte_flags);
-    if ( (flags & GNTMAP_application_map) )
-        l1e_add_flags(pte,_PAGE_USER);
-    if ( !(flags & GNTMAP_readonly) )
-        l1e_add_flags(pte,_PAGE_RW);
-
-    l1e_add_flags(pte,
-                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
-                   & _PAGE_AVAIL);
-
-    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
-
-    if ( flags & GNTMAP_contains_pte )
-        return create_grant_pte_mapping(addr, pte, current);
-    return create_grant_va_mapping(addr, pte, current);
-}
-
-int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
-                             uint64_t new_addr, unsigned int flags)
-{
-    struct vcpu *curr = current;
-    l1_pgentry_t *pl1e, ol1e;
-    unsigned long gl1mfn;
-    struct page_info *l1pg;
-    int rc;
-
-    if ( flags & GNTMAP_contains_pte )
-    {
-        if ( !new_addr )
-            return destroy_grant_pte_mapping(addr, frame, curr->domain);
-
-        return GNTST_general_error;
-    }
-
-    if ( !new_addr )
-        return destroy_grant_va_mapping(addr, frame, curr);
-
-    pl1e = pv_map_guest_l1e(new_addr, &gl1mfn);
-    if ( !pl1e )
-    {
-        gdprintk(XENLOG_WARNING,
-                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
-        return GNTST_general_error;
-    }
-
-    if ( get_page_from_pagenr(gl1mfn, current->domain) )
-    {
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    l1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(l1pg) )
-    {
-        put_page(l1pg);
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    ol1e = *pl1e;
-
-    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
-                                gl1mfn, curr, 0)) )
-    {
-        page_unlock(l1pg);
-        put_page(l1pg);
-        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
-        pv_unmap_guest_l1e(pl1e);
-        return GNTST_general_error;
-    }
-
-    page_unlock(l1pg);
-    put_page(l1pg);
-    pv_unmap_guest_l1e(pl1e);
-
-    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
-    if ( rc )
-        put_page_from_l1e(ol1e, curr->domain);
-
-    return rc;
-}
-
 int donate_page(
     struct domain *d, struct page_info *page, unsigned int memflags)
 {
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 016b1b6e8f..501c766cc2 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -6,6 +6,7 @@ obj-y += emul-inv-op.o
 obj-y += emul-mmio-op.o
 obj-y += emul-priv-op.o
 obj-y += emul-ptwr-op.o
+obj-y += grant_table.o
 obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
diff --git a/xen/arch/x86/pv/grant_table.c b/xen/arch/x86/pv/grant_table.c
new file mode 100644
index 0000000000..6c22cd01a7
--- /dev/null
+++ b/xen/arch/x86/pv/grant_table.c
@@ -0,0 +1,386 @@
+/******************************************************************************
+ * arch/x86/pv/grant_table.c
+ *
+ * Grant table interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/types.h>
+
+#include <public/grant_table.h>
+
+#include <asm/p2m.h>
+#include <asm/pv/mm.h>
+
+static int create_grant_pte_mapping(uint64_t pte_addr, l1_pgentry_t nl1e,
+                                    struct vcpu *v)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+    struct domain *d = v->domain;
+
+    adjust_guest_l1e(nl1e, d);
+
+    gmfn = pte_addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+    if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+    put_page_from_l1e(ol1e, d);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+
+    return rc;
+}
+
+static int destroy_grant_pte_mapping(uint64_t addr, unsigned long frame,
+                                     struct domain *d)
+{
+    int rc = GNTST_okay;
+    void *va;
+    unsigned long gmfn, mfn;
+    struct page_info *page;
+    l1_pgentry_t ol1e;
+
+    gmfn = addr >> PAGE_SHIFT;
+    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+    if ( unlikely(!page) )
+    {
+        gdprintk(XENLOG_WARNING, "Could not get page for normal update\n");
+        return GNTST_general_error;
+    }
+
+    mfn = page_to_mfn(page);
+    va = map_domain_page(_mfn(mfn));
+    va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+
+    if ( !page_lock(page) )
+    {
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(page);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    ol1e = *(l1_pgentry_t *)va;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %"PRIpte" for address %"PRIx64" doesn't match frame %lx\n",
+                 l1e_get_intpte(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY(l1,
+                                (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
+                                d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+                                0)) )
+    {
+        page_unlock(page);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", va);
+        rc = GNTST_general_error;
+        goto failed;
+    }
+
+    page_unlock(page);
+
+ failed:
+    unmap_domain_page(va);
+    put_page(page);
+    return rc;
+}
+
+
+static int create_grant_va_mapping(unsigned long va, l1_pgentry_t nl1e,
+                                   struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    struct domain *d = v->domain;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int okay;
+
+    adjust_guest_l1e(nl1e, d);
+
+    pl1e = pv_map_guest_l1e(va, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", va);
+        return GNTST_general_error;
+    }
+
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+    okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    pv_unmap_guest_l1e(pl1e);
+
+    if ( okay )
+        put_page_from_l1e(ol1e, d);
+
+    return okay ? GNTST_okay : GNTST_general_error;
+}
+
+static int replace_grant_va_mapping(unsigned long addr, unsigned long frame,
+                                    l1_pgentry_t nl1e, struct vcpu *v)
+{
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc = 0;
+
+    pl1e = pv_map_guest_l1e(addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING, "Could not find L1 PTE for address %lx\n", addr);
+        return GNTST_general_error;
+    }
+
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        rc = GNTST_general_error;
+        goto out;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        rc = GNTST_general_error;
+        put_page(l1pg);
+        goto out;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    ol1e = *pl1e;
+
+    /* Check that the virtual address supplied is actually mapped to frame. */
+    if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "PTE entry %lx for address %lx doesn't match frame %lx\n",
+                 l1e_get_pfn(ol1e), addr, frame);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+    /* Delete pagetable entry. */
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        rc = GNTST_general_error;
+        goto unlock_and_out;
+    }
+
+ unlock_and_out:
+    page_unlock(l1pg);
+    put_page(l1pg);
+ out:
+    pv_unmap_guest_l1e(pl1e);
+    return rc;
+}
+
+static int destroy_grant_va_mapping(unsigned long addr, unsigned long frame,
+                                    struct vcpu *v)
+{
+    return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                            unsigned int flags, unsigned int cache_flags)
+{
+    l1_pgentry_t pte;
+    uint32_t grant_pte_flags;
+
+    grant_pte_flags =
+        _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
+    if ( cpu_has_nx )
+        grant_pte_flags |= _PAGE_NX_BIT;
+
+    pte = l1e_from_pfn(frame, grant_pte_flags);
+    if ( (flags & GNTMAP_application_map) )
+        l1e_add_flags(pte,_PAGE_USER);
+    if ( !(flags & GNTMAP_readonly) )
+        l1e_add_flags(pte,_PAGE_RW);
+
+    l1e_add_flags(pte,
+                  ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
+                   & _PAGE_AVAIL);
+
+    l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
+
+    if ( flags & GNTMAP_contains_pte )
+        return create_grant_pte_mapping(addr, pte, current);
+    return create_grant_va_mapping(addr, pte, current);
+}
+
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+                             uint64_t new_addr, unsigned int flags)
+{
+    struct vcpu *curr = current;
+    l1_pgentry_t *pl1e, ol1e;
+    unsigned long gl1mfn;
+    struct page_info *l1pg;
+    int rc;
+
+    if ( flags & GNTMAP_contains_pte )
+    {
+        if ( !new_addr )
+            return destroy_grant_pte_mapping(addr, frame, curr->domain);
+
+        return GNTST_general_error;
+    }
+
+    if ( !new_addr )
+        return destroy_grant_va_mapping(addr, frame, curr);
+
+    pl1e = pv_map_guest_l1e(new_addr, &gl1mfn);
+    if ( !pl1e )
+    {
+        gdprintk(XENLOG_WARNING,
+                 "Could not find L1 PTE for address %"PRIx64"\n", new_addr);
+        return GNTST_general_error;
+    }
+
+    if ( get_page_from_pagenr(gl1mfn, current->domain) )
+    {
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    l1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(l1pg) )
+    {
+        put_page(l1pg);
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    ol1e = *pl1e;
+
+    if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+                                gl1mfn, curr, 0)) )
+    {
+        page_unlock(l1pg);
+        put_page(l1pg);
+        gdprintk(XENLOG_WARNING, "Cannot delete PTE entry at %p\n", pl1e);
+        pv_unmap_guest_l1e(pl1e);
+        return GNTST_general_error;
+    }
+
+    page_unlock(l1pg);
+    put_page(l1pg);
+    pv_unmap_guest_l1e(pl1e);
+
+    rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
+    if ( rc )
+        put_page_from_l1e(ol1e, curr->domain);
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 16/21] x86/mm: split out descriptor table code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (14 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 15/21] x86/mm: split out PV grant table code Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 17/21] x86/mm: move compat descriptor handling code Wei Liu
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/descriptor-tables.c. Add "pv_" prefix to
{set,destroy}_gdt. Fix up call sites. Move the declarations to new
header file. Fix coding style issues while moving code.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c               |  11 ++-
 xen/arch/x86/mm.c                   | 156 ------------------------------
 xen/arch/x86/pv/Makefile            |   1 +
 xen/arch/x86/pv/descriptor-tables.c | 188 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/mm.c     |   6 +-
 xen/include/asm-x86/processor.h     |   5 -
 xen/include/asm-x86/pv/processor.h  |  40 ++++++++
 7 files changed, 239 insertions(+), 168 deletions(-)
 create mode 100644 xen/arch/x86/pv/descriptor-tables.c
 create mode 100644 xen/include/asm-x86/pv/processor.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index dd8bf1302f..ff6b579b0b 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -64,6 +64,7 @@
 #include <compat/vcpu.h>
 #include <asm/psr.h>
 #include <asm/pv/domain.h>
+#include <asm/pv/processor.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
 
@@ -987,7 +988,7 @@ int arch_set_info_guest(
         return rc;
 
     if ( !compat )
-        rc = (int)set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents);
+        rc = (int)pv_set_gdt(v, c.nat->gdt_frames, c.nat->gdt_ents);
     else
     {
         unsigned long gdt_frames[ARRAY_SIZE(v->arch.pv_vcpu.gdt_frames)];
@@ -997,7 +998,7 @@ int arch_set_info_guest(
             return -EINVAL;
         for ( i = 0; i < n; ++i )
             gdt_frames[i] = c.cmp->gdt_frames[i];
-        rc = (int)set_gdt(v, gdt_frames, c.cmp->gdt_ents);
+        rc = (int)pv_set_gdt(v, gdt_frames, c.cmp->gdt_ents);
     }
     if ( rc != 0 )
         return rc;
@@ -1096,7 +1097,7 @@ int arch_set_info_guest(
     {
         if ( cr3_page )
             put_page(cr3_page);
-        destroy_gdt(v);
+        pv_destroy_gdt(v);
         return rc;
     }
 
@@ -1148,7 +1149,7 @@ int arch_vcpu_reset(struct vcpu *v)
 {
     if ( is_pv_vcpu(v) )
     {
-        destroy_gdt(v);
+        pv_destroy_gdt(v);
         return vcpu_destroy_pagetables(v);
     }
 
@@ -1893,7 +1894,7 @@ int domain_relinquish_resources(struct domain *d)
                  * the LDT as it automatically gets squashed with the guest
                  * mappings.
                  */
-                destroy_gdt(v);
+                pv_destroy_gdt(v);
             }
         }
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 897db4cfb9..1a9517fda8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4066,162 +4066,6 @@ long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
 }
 
 
-
-/*************************
- * Descriptor Tables
- */
-
-void destroy_gdt(struct vcpu *v)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
-
-    v->arch.pv_vcpu.gdt_ents = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
-    {
-        pfn = l1e_get_pfn(pl1e[i]);
-        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
-            put_page_and_type(mfn_to_page(pfn));
-        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
-        v->arch.pv_vcpu.gdt_frames[i] = 0;
-    }
-}
-
-
-long set_gdt(struct vcpu *v,
-             unsigned long *frames,
-             unsigned int entries)
-{
-    struct domain *d = v->domain;
-    l1_pgentry_t *pl1e;
-    /* NB. There are 512 8-byte entries per GDT page. */
-    unsigned int i, nr_pages = (entries + 511) / 512;
-
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    /* Check the pages in the new GDT. */
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        struct page_info *page;
-
-        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
-        if ( !page )
-            goto fail;
-        if ( !get_page_type(page, PGT_seg_desc_page) )
-        {
-            put_page(page);
-            goto fail;
-        }
-        frames[i] = page_to_mfn(page);
-    }
-
-    /* Tear down the old GDT. */
-    destroy_gdt(v);
-
-    /* Install the new GDT. */
-    v->arch.pv_vcpu.gdt_ents = entries;
-    pl1e = gdt_ldt_ptes(d, v);
-    for ( i = 0; i < nr_pages; i++ )
-    {
-        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
-        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
-    }
-
-    return 0;
-
- fail:
-    while ( i-- > 0 )
-    {
-        put_page_and_type(mfn_to_page(frames[i]));
-    }
-    return -EINVAL;
-}
-
-
-long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
-                unsigned int entries)
-{
-    int nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    struct vcpu *curr = current;
-    long ret;
-
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    if ( copy_from_guest(frames, frame_list, nr_pages) )
-        return -EFAULT;
-
-    domain_lock(curr->domain);
-
-    if ( (ret = set_gdt(curr, frames, entries)) == 0 )
-        flush_tlb_local();
-
-    domain_unlock(curr->domain);
-
-    return ret;
-}
-
-
-long do_update_descriptor(u64 pa, u64 desc)
-{
-    struct domain *dom = current->domain;
-    unsigned long gmfn = pa >> PAGE_SHIFT;
-    unsigned long mfn;
-    unsigned int  offset;
-    struct desc_struct *gdt_pent, d;
-    struct page_info *page;
-    long ret = -EINVAL;
-
-    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
-
-    *(u64 *)&d = desc;
-
-    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
-    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
-         !page ||
-         !check_descriptor(dom, &d) )
-    {
-        if ( page )
-            put_page(page);
-        return -EINVAL;
-    }
-    mfn = page_to_mfn(page);
-
-    /* Check if the given frame is in use in an unsafe context. */
-    switch ( page->u.inuse.type_info & PGT_type_mask )
-    {
-    case PGT_seg_desc_page:
-        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
-            goto out;
-        break;
-    default:
-        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
-            goto out;
-        break;
-    }
-
-    paging_mark_dirty(dom, _mfn(mfn));
-
-    /* All is good so make the update. */
-    gdt_pent = map_domain_page(_mfn(mfn));
-    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
-    unmap_domain_page(gdt_pent);
-
-    put_page_type(page);
-
-    ret = 0; /* success */
-
- out:
-    put_page(page);
-
-    return ret;
-}
-
 typedef struct e820entry e820entry_t;
 DEFINE_XEN_GUEST_HANDLE(e820entry_t);
 
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 501c766cc2..42e9d3723b 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,4 +1,5 @@
 obj-y += callback.o
+obj-y += descriptor-tables.o
 obj-y += domain.o
 obj-y += emulate.o
 obj-y += emul-gate-op.o
diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
new file mode 100644
index 0000000000..12dc45b671
--- /dev/null
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -0,0 +1,188 @@
+/******************************************************************************
+ * arch/x86/pv/descriptor-tables.c
+ *
+ * Descriptor table related code
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+
+#include <asm/p2m.h>
+#include <asm/pv/processor.h>
+
+/*************************
+ * Descriptor Tables
+ */
+
+void pv_destroy_gdt(struct vcpu *v)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    unsigned long pfn, zero_pfn = PFN_DOWN(__pa(zero_page));
+
+    v->arch.pv_vcpu.gdt_ents = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+    for ( i = 0; i < FIRST_RESERVED_GDT_PAGE; i++ )
+    {
+        pfn = l1e_get_pfn(pl1e[i]);
+        if ( (l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) && pfn != zero_pfn )
+            put_page_and_type(mfn_to_page(pfn));
+        l1e_write(&pl1e[i], l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO));
+        v->arch.pv_vcpu.gdt_frames[i] = 0;
+    }
+}
+
+long pv_set_gdt(struct vcpu *v, unsigned long *frames, unsigned int entries)
+{
+    struct domain *d = v->domain;
+    l1_pgentry_t *pl1e;
+    /* NB. There are 512 8-byte entries per GDT page. */
+    unsigned int i, nr_pages = (entries + 511) / 512;
+
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    /* Check the pages in the new GDT. */
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        struct page_info *page;
+
+        page = get_page_from_gfn(d, frames[i], NULL, P2M_ALLOC);
+        if ( !page )
+            goto fail;
+        if ( !get_page_type(page, PGT_seg_desc_page) )
+        {
+            put_page(page);
+            goto fail;
+        }
+        frames[i] = page_to_mfn(page);
+    }
+
+    /* Tear down the old GDT. */
+    pv_destroy_gdt(v);
+
+    /* Install the new GDT. */
+    v->arch.pv_vcpu.gdt_ents = entries;
+    pl1e = gdt_ldt_ptes(d, v);
+    for ( i = 0; i < nr_pages; i++ )
+    {
+        v->arch.pv_vcpu.gdt_frames[i] = frames[i];
+        l1e_write(&pl1e[i], l1e_from_pfn(frames[i], __PAGE_HYPERVISOR_RW));
+    }
+
+    return 0;
+
+ fail:
+    while ( i-- > 0 )
+    {
+        put_page_and_type(mfn_to_page(frames[i]));
+    }
+    return -EINVAL;
+}
+
+
+long do_set_gdt(XEN_GUEST_HANDLE_PARAM(xen_ulong_t) frame_list,
+                unsigned int entries)
+{
+    int nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    struct vcpu *curr = current;
+    long ret;
+
+    /* Rechecked in pv_set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( copy_from_guest(frames, frame_list, nr_pages) )
+        return -EFAULT;
+
+    domain_lock(curr->domain);
+
+    if ( (ret = pv_set_gdt(curr, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(curr->domain);
+
+    return ret;
+}
+
+long do_update_descriptor(u64 pa, u64 desc)
+{
+    struct domain *dom = current->domain;
+    unsigned long gmfn = pa >> PAGE_SHIFT;
+    unsigned long mfn;
+    unsigned int  offset;
+    struct desc_struct *gdt_pent, d;
+    struct page_info *page;
+    long ret = -EINVAL;
+
+    offset = ((unsigned int)pa & ~PAGE_MASK) / sizeof(struct desc_struct);
+
+    *(u64 *)&d = desc;
+
+    page = get_page_from_gfn(dom, gmfn, NULL, P2M_ALLOC);
+    if ( (((unsigned int)pa % sizeof(struct desc_struct)) != 0) ||
+         !page ||
+         !check_descriptor(dom, &d) )
+    {
+        if ( page )
+            put_page(page);
+        return -EINVAL;
+    }
+    mfn = page_to_mfn(page);
+
+    /* Check if the given frame is in use in an unsafe context. */
+    switch ( page->u.inuse.type_info & PGT_type_mask )
+    {
+    case PGT_seg_desc_page:
+        if ( unlikely(!get_page_type(page, PGT_seg_desc_page)) )
+            goto out;
+        break;
+    default:
+        if ( unlikely(!get_page_type(page, PGT_writable_page)) )
+            goto out;
+        break;
+    }
+
+    paging_mark_dirty(dom, _mfn(mfn));
+
+    /* All is good so make the update. */
+    gdt_pent = map_domain_page(_mfn(mfn));
+    write_atomic((uint64_t *)&gdt_pent[offset], *(uint64_t *)&d);
+    unmap_domain_page(gdt_pent);
+
+    put_page_type(page);
+
+    ret = 0; /* success */
+
+ out:
+    put_page(page);
+
+    return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/x86_64/compat/mm.c b/xen/arch/x86/x86_64/compat/mm.c
index b737af1888..dc41f61c71 100644
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -6,13 +6,15 @@
 #include <asm/mem_paging.h>
 #include <asm/mem_sharing.h>
 
+#include <asm/pv/processor.h>
+
 int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
 {
     unsigned int i, nr_pages = (entries + 511) / 512;
     unsigned long frames[16];
     long ret;
 
-    /* Rechecked in set_gdt, but ensures a sane limit for copy_from_user(). */
+    /* Rechecked in pv_set_gdt, but ensures a sane limit for copy_from_user(). */
     if ( entries > FIRST_RESERVED_GDT_ENTRY )
         return -EINVAL;
 
@@ -31,7 +33,7 @@ int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries
 
     domain_lock(current->domain);
 
-    if ( (ret = set_gdt(current, frames, entries)) == 0 )
+    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
         flush_tlb_local();
 
     domain_unlock(current->domain);
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 5bf56b45e1..1463a3acb7 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -459,11 +459,6 @@ extern void init_int80_direct_trap(struct vcpu *v);
 
 extern void write_ptbase(struct vcpu *v);
 
-void destroy_gdt(struct vcpu *d);
-long set_gdt(struct vcpu *d, 
-             unsigned long *frames, 
-             unsigned int entries);
-
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static always_inline void rep_nop(void)
 {
diff --git a/xen/include/asm-x86/pv/processor.h b/xen/include/asm-x86/pv/processor.h
new file mode 100644
index 0000000000..8ab5773871
--- /dev/null
+++ b/xen/include/asm-x86/pv/processor.h
@@ -0,0 +1,40 @@
+/*
+ * asm-x86/pv/processor.h
+ *
+ * Vcpu interfaces for PV guests
+ *
+ * Copyright (C) 2017 Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __X86_PV_PROCESSOR_H__
+#define __X86_PV_PROCESSOR_H__
+
+#ifdef CONFIG_PV
+
+void pv_destroy_gdt(struct vcpu *d);
+long pv_set_gdt(struct vcpu *d, unsigned long *frames, unsigned int entries);
+
+#else
+
+#include <xen/errno.h>
+
+static inline void pv_destroy_gdt(struct vcpu *d) {}
+static inline long pv_set_gdt(struct vcpu *d, unsigned long *frames,
+                              unsigned int entries)
+{ return -EINVAL; }
+
+#endif
+
+#endif /* __X86_PV_PROCESSOR_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 17/21] x86/mm: move compat descriptor handling code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (15 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 16/21] x86/mm: split out descriptor " Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 18/21] x86/mm: move and rename map_ldt_shadow_page Wei Liu
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move them along side the non-compat variants.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/pv/descriptor-tables.c | 40 ++++++++++++++++++++++++++++++++++++
 xen/arch/x86/x86_64/compat/mm.c     | 41 -------------------------------------
 2 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
index 12dc45b671..a302812774 100644
--- a/xen/arch/x86/pv/descriptor-tables.c
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -177,6 +177,46 @@ long do_update_descriptor(u64 pa, u64 desc)
     return ret;
 }
 
+int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list,
+                   unsigned int entries)
+{
+    unsigned int i, nr_pages = (entries + 511) / 512;
+    unsigned long frames[16];
+    long ret;
+
+    /* Rechecked in pv_set_gdt, but ensures a sane limit for copy_from_user(). */
+    if ( entries > FIRST_RESERVED_GDT_ENTRY )
+        return -EINVAL;
+
+    if ( !guest_handle_okay(frame_list, nr_pages) )
+        return -EFAULT;
+
+    for ( i = 0; i < nr_pages; ++i )
+    {
+        unsigned int frame;
+
+        if ( __copy_from_guest(&frame, frame_list, 1) )
+            return -EFAULT;
+        frames[i] = frame;
+        guest_handle_add_offset(frame_list, 1);
+    }
+
+    domain_lock(current->domain);
+
+    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
+        flush_tlb_local();
+
+    domain_unlock(current->domain);
+
+    return ret;
+}
+
+int compat_update_descriptor(u32 pa_lo, u32 pa_hi, u32 desc_lo, u32 desc_hi)
+{
+    return do_update_descriptor(pa_lo | ((u64)pa_hi << 32),
+                                desc_lo | ((u64)desc_hi << 32));
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/x86_64/compat/mm.c b/xen/arch/x86/x86_64/compat/mm.c
index dc41f61c71..df91020620 100644
--- a/xen/arch/x86/x86_64/compat/mm.c
+++ b/xen/arch/x86/x86_64/compat/mm.c
@@ -6,47 +6,6 @@
 #include <asm/mem_paging.h>
 #include <asm/mem_sharing.h>
 
-#include <asm/pv/processor.h>
-
-int compat_set_gdt(XEN_GUEST_HANDLE_PARAM(uint) frame_list, unsigned int entries)
-{
-    unsigned int i, nr_pages = (entries + 511) / 512;
-    unsigned long frames[16];
-    long ret;
-
-    /* Rechecked in pv_set_gdt, but ensures a sane limit for copy_from_user(). */
-    if ( entries > FIRST_RESERVED_GDT_ENTRY )
-        return -EINVAL;
-
-    if ( !guest_handle_okay(frame_list, nr_pages) )
-        return -EFAULT;
-
-    for ( i = 0; i < nr_pages; ++i )
-    {
-        unsigned int frame;
-
-        if ( __copy_from_guest(&frame, frame_list, 1) )
-            return -EFAULT;
-        frames[i] = frame;
-        guest_handle_add_offset(frame_list, 1);
-    }
-
-    domain_lock(current->domain);
-
-    if ( (ret = pv_set_gdt(current, frames, entries)) == 0 )
-        flush_tlb_local();
-
-    domain_unlock(current->domain);
-
-    return ret;
-}
-
-int compat_update_descriptor(u32 pa_lo, u32 pa_hi, u32 desc_lo, u32 desc_hi)
-{
-    return do_update_descriptor(pa_lo | ((u64)pa_hi << 32),
-                                desc_lo | ((u64)desc_hi << 32));
-}
-
 int compat_arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     struct compat_machphys_mfn_list xmml;
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 18/21] x86/mm: move and rename map_ldt_shadow_page
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (16 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 17/21] x86/mm: move compat descriptor handling code Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 19/21] x86/mm: factor out pv_arch_init_memory Wei Liu
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Take the chance to change v to curr and d to currd in code. Also
change the return type to bool.  Fix up all the call sites.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c                   | 42 -------------------------------------
 xen/arch/x86/pv/descriptor-tables.c | 42 +++++++++++++++++++++++++++++++++++++
 xen/arch/x86/traps.c                |  5 +++--
 xen/include/asm-x86/mm.h            |  2 --
 xen/include/asm-x86/pv/processor.h  |  2 ++
 5 files changed, 47 insertions(+), 46 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1a9517fda8..109a109155 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -604,48 +604,6 @@ static int alloc_segdesc_page(struct page_info *page)
 }
 
 
-/* Map shadow page at offset @off. */
-int map_ldt_shadow_page(unsigned int off)
-{
-    struct vcpu *v = current;
-    struct domain *d = v->domain;
-    unsigned long gmfn;
-    struct page_info *page;
-    l1_pgentry_t l1e, nl1e;
-    unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
-    int okay;
-
-    BUG_ON(unlikely(in_irq()));
-
-    if ( is_pv_32bit_domain(d) )
-        gva = (u32)gva;
-    pv_get_guest_eff_kern_l1e(v, gva, &l1e);
-    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
-        return 0;
-
-    gmfn = l1e_get_pfn(l1e);
-    page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-    if ( unlikely(!page) )
-        return 0;
-
-    okay = get_page_type(page, PGT_seg_desc_page);
-    if ( unlikely(!okay) )
-    {
-        put_page(page);
-        return 0;
-    }
-
-    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-    l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
-    v->arch.pv_vcpu.shadow_ldt_mapcnt++;
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    return 1;
-}
-
-
 int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
 {
     struct page_info *page = mfn_to_page(page_nr);
diff --git a/xen/arch/x86/pv/descriptor-tables.c b/xen/arch/x86/pv/descriptor-tables.c
index a302812774..6ac5c736cf 100644
--- a/xen/arch/x86/pv/descriptor-tables.c
+++ b/xen/arch/x86/pv/descriptor-tables.c
@@ -24,6 +24,7 @@
 #include <xen/hypercall.h>
 
 #include <asm/p2m.h>
+#include <asm/pv/mm.h>
 #include <asm/pv/processor.h>
 
 /*************************
@@ -217,6 +218,47 @@ int compat_update_descriptor(u32 pa_lo, u32 pa_hi, u32 desc_lo, u32 desc_hi)
                                 desc_lo | ((u64)desc_hi << 32));
 }
 
+/* Map shadow page at offset @off. */
+bool pv_map_ldt_shadow_page(unsigned int off)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    unsigned long gmfn;
+    struct page_info *page;
+    l1_pgentry_t l1e, nl1e;
+    unsigned long gva = curr->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
+    int okay;
+
+    BUG_ON(unlikely(in_irq()));
+
+    if ( is_pv_32bit_domain(currd) )
+        gva = (u32)gva;
+    pv_get_guest_eff_kern_l1e(curr, gva, &l1e);
+    if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
+        return false;
+
+    gmfn = l1e_get_pfn(l1e);
+    page = get_page_from_gfn(currd, gmfn, NULL, P2M_ALLOC);
+    if ( unlikely(!page) )
+        return false;
+
+    okay = get_page_type(page, PGT_seg_desc_page);
+    if ( unlikely(!okay) )
+    {
+        put_page(page);
+        return false;
+    }
+
+    nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
+
+    spin_lock(&curr->arch.pv_vcpu.shadow_ldt_lock);
+    l1e_write(&gdt_ldt_ptes(currd, curr)[off + 16], nl1e);
+    curr->arch.pv_vcpu.shadow_ldt_mapcnt++;
+    spin_unlock(&curr->arch.pv_vcpu.shadow_ldt_lock);
+
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index b93b3d1317..dbdcdf62a6 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -77,6 +77,7 @@
 #include <public/arch-x86/cpuid.h>
 #include <asm/cpuid.h>
 #include <xsm/xsm.h>
+#include <asm/pv/processor.h>
 #include <asm/pv/traps.h>
 
 /*
@@ -1100,7 +1101,7 @@ static int handle_gdt_ldt_mapping_fault(unsigned long offset,
     /*
      * If the fault is in another vcpu's area, it cannot be due to
      * a GDT/LDT descriptor load. Thus we can reasonably exit immediately, and
-     * indeed we have to since map_ldt_shadow_page() works correctly only on
+     * indeed we have to since pv_map_ldt_shadow_page() works correctly only on
      * accesses to a vcpu's own area.
      */
     if ( vcpu_area != curr->vcpu_id )
@@ -1112,7 +1113,7 @@ static int handle_gdt_ldt_mapping_fault(unsigned long offset,
     if ( likely(is_ldt_area) )
     {
         /* LDT fault: Copy a mapping from the guest's LDT, if it is valid. */
-        if ( likely(map_ldt_shadow_page(offset >> PAGE_SHIFT)) )
+        if ( likely(pv_map_ldt_shadow_page(offset >> PAGE_SHIFT)) )
         {
             if ( guest_mode(regs) )
                 trace_trap_two_addr(TRC_PV_GDT_LDT_MAPPING_FAULT,
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 6fc1e7d5ca..07287d97ca 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -550,8 +550,6 @@ long subarch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg);
 int compat_arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void));
 int compat_subarch_memory_op(int op, XEN_GUEST_HANDLE_PARAM(void));
 
-int map_ldt_shadow_page(unsigned int);
-
 #define NIL(type) ((type *)-sizeof(type))
 #define IS_NIL(ptr) (!((uintptr_t)(ptr) + sizeof(*(ptr))))
 
diff --git a/xen/include/asm-x86/pv/processor.h b/xen/include/asm-x86/pv/processor.h
index 8ab5773871..6f9e1afe8a 100644
--- a/xen/include/asm-x86/pv/processor.h
+++ b/xen/include/asm-x86/pv/processor.h
@@ -25,6 +25,7 @@
 
 void pv_destroy_gdt(struct vcpu *d);
 long pv_set_gdt(struct vcpu *d, unsigned long *frames, unsigned int entries);
+bool pv_map_ldt_shadow_page(unsigned int);
 
 #else
 
@@ -34,6 +35,7 @@ static inline void pv_destroy_gdt(struct vcpu *d) {}
 static inline long pv_set_gdt(struct vcpu *d, unsigned long *frames,
                               unsigned int entries)
 { return -EINVAL; }
+static inline bool pv_map_ldt_shadow_page(unsigned int) { return false; }
 
 #endif
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 19/21] x86/mm: factor out pv_arch_init_memory
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (17 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 18/21] x86/mm: move and rename map_ldt_shadow_page Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 20/21] x86/mm: move l4 table setup code Wei Liu
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the split l4 setup code into the new function. The new function
is also going to contain other PV specific setup code in later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 73 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 109a109155..c7c989d8f8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -259,6 +259,45 @@ static l4_pgentry_t __read_mostly split_l4e;
 #define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
 #endif
 
+static void pv_arch_init_memory(void)
+{
+#ifndef NDEBUG
+    unsigned int i;
+
+    if ( highmem_start )
+    {
+        unsigned long split_va = (unsigned long)__va(highmem_start);
+
+        if ( split_va < HYPERVISOR_VIRT_END &&
+             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
+        {
+            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
+                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
+            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
+            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
+            {
+                l3_pgentry_t *l3tab = alloc_xen_pagetable();
+
+                if ( l3tab )
+                {
+                    const l3_pgentry_t *l3idle =
+                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
+
+                    for ( i = 0; i < l3_table_offset(split_va); ++i )
+                        l3tab[i] = l3idle[i];
+                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
+                        l3tab[i] = l3e_empty();
+                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
+                                             __PAGE_HYPERVISOR_RW);
+                }
+                else
+                    ++root_pgt_pv_xen_slots;
+            }
+        }
+    }
+#endif
+}
+
 void __init arch_init_memory(void)
 {
     unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
@@ -353,39 +392,7 @@ void __init arch_init_memory(void)
 
     mem_sharing_init();
 
-#ifndef NDEBUG
-    if ( highmem_start )
-    {
-        unsigned long split_va = (unsigned long)__va(highmem_start);
-
-        if ( split_va < HYPERVISOR_VIRT_END &&
-             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
-        {
-            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
-                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
-            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
-            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
-            {
-                l3_pgentry_t *l3tab = alloc_xen_pagetable();
-
-                if ( l3tab )
-                {
-                    const l3_pgentry_t *l3idle =
-                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
-
-                    for ( i = 0; i < l3_table_offset(split_va); ++i )
-                        l3tab[i] = l3idle[i];
-                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
-                        l3tab[i] = l3e_empty();
-                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
-                                             __PAGE_HYPERVISOR_RW);
-                }
-                else
-                    ++root_pgt_pv_xen_slots;
-            }
-        }
-    }
-#endif
+    pv_arch_init_memory();
 }
 
 int page_is_ram_type(unsigned long mfn, unsigned long mem_type)
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 20/21] x86/mm: move l4 table setup code
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (18 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 19/21] x86/mm: factor out pv_arch_init_memory Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-20 16:04 ` [PATCH v3 21/21] x86/mm: add "pv_" prefix to new_guest_cr3 Wei Liu
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move two functions to pv/mm.c. Add prefix to init_guest_l4_table.
Export them via pv/mm.h. Fix up call sites.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c            | 69 +-------------------------------------------
 xen/arch/x86/pv/dom0_build.c |  3 +-
 xen/arch/x86/pv/domain.c     |  3 +-
 xen/arch/x86/pv/mm.c         | 68 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h     |  2 --
 xen/include/asm-x86/pv/mm.h  |  8 +++++
 6 files changed, 81 insertions(+), 72 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index c7c989d8f8..5687e29824 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -251,53 +251,6 @@ void __init init_frametable(void)
         init_spagetable();
 }
 
-#ifndef NDEBUG
-static unsigned int __read_mostly root_pgt_pv_xen_slots
-    = ROOT_PAGETABLE_PV_XEN_SLOTS;
-static l4_pgentry_t __read_mostly split_l4e;
-#else
-#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
-#endif
-
-static void pv_arch_init_memory(void)
-{
-#ifndef NDEBUG
-    unsigned int i;
-
-    if ( highmem_start )
-    {
-        unsigned long split_va = (unsigned long)__va(highmem_start);
-
-        if ( split_va < HYPERVISOR_VIRT_END &&
-             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
-        {
-            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
-                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
-            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
-            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
-            {
-                l3_pgentry_t *l3tab = alloc_xen_pagetable();
-
-                if ( l3tab )
-                {
-                    const l3_pgentry_t *l3idle =
-                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
-
-                    for ( i = 0; i < l3_table_offset(split_va); ++i )
-                        l3tab[i] = l3idle[i];
-                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
-                        l3tab[i] = l3e_empty();
-                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
-                                             __PAGE_HYPERVISOR_RW);
-                }
-                else
-                    ++root_pgt_pv_xen_slots;
-            }
-        }
-    }
-#endif
-}
-
 void __init arch_init_memory(void)
 {
     unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn;
@@ -1468,26 +1421,6 @@ static int alloc_l3_table(struct page_info *page)
     return rc > 0 ? 0 : rc;
 }
 
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
-                         bool zap_ro_mpt)
-{
-    /* Xen private mappings. */
-    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
-           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
-#ifndef NDEBUG
-    if ( l4e_get_intpte(split_l4e) )
-        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
-            split_l4e;
-#endif
-    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
-        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR_RW);
-    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
-        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
-    if ( zap_ro_mpt || is_pv_32bit_domain(d) )
-        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
-}
-
 bool fill_ro_mpt(unsigned long mfn)
 {
     l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
@@ -1562,7 +1495,7 @@ static int alloc_l4_table(struct page_info *page)
 
     if ( rc >= 0 )
     {
-        init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
+        pv_init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
         atomic_inc(&d->arch.pv_domain.nr_l4_pages);
         rc = 0;
     }
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 18c19a256f..ef789410fe 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -18,6 +18,7 @@
 #include <asm/bzimage.h>
 #include <asm/dom0_build.h>
 #include <asm/page.h>
+#include <asm/pv/mm.h>
 #include <asm/setup.h>
 
 /* Allow ring-3 access in long mode as guest cannot use ring 1 ... */
@@ -590,7 +591,7 @@ int __init dom0_construct_pv(struct domain *d,
         l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
     }
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, d, 0);
+    pv_init_guest_l4_table(l4tab, d, 0);
     v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
     if ( is_pv_32bit_domain(d) )
         v->arch.guest_table_user = v->arch.guest_table;
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 6cb61f2e14..415d0634a3 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -10,6 +10,7 @@
 #include <xen/sched.h>
 
 #include <asm/pv/domain.h>
+#include <asm/pv/mm.h>
 
 static void noreturn continue_nonidle_domain(struct vcpu *v)
 {
@@ -29,7 +30,7 @@ static int setup_compat_l4(struct vcpu *v)
 
     l4tab = __map_domain_page(pg);
     clear_page(l4tab);
-    init_guest_l4_table(l4tab, v->domain, 1);
+    pv_init_guest_l4_table(l4tab, v->domain, 1);
     unmap_domain_page(l4tab);
 
     /* This page needs to look like a pagetable so that it can be shadowed */
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 32e73d59df..0f4303cef2 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -23,6 +23,7 @@
 #include <xen/guest_access.h>
 
 #include <asm/pv/mm.h>
+#include <asm/setup.h>
 
 /*
  * PTE updates can be done with ordinary writes except:
@@ -32,6 +33,14 @@
 #define PTE_UPDATE_WITH_CMPXCHG
 #endif
 
+#ifndef NDEBUG
+static unsigned int __read_mostly root_pgt_pv_xen_slots
+    = ROOT_PAGETABLE_PV_XEN_SLOTS;
+static l4_pgentry_t __read_mostly split_l4e;
+#else
+#define root_pgt_pv_xen_slots ROOT_PAGETABLE_PV_XEN_SLOTS
+#endif
+
 /* Read a PV guest's l1e that maps this virtual address. */
 void pv_get_guest_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
 {
@@ -96,6 +105,65 @@ void pv_unmap_guest_l1e(void *p)
     unmap_domain_page(p);
 }
 
+void pv_init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+                            bool zap_ro_mpt)
+{
+    /* Xen private mappings. */
+    memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           &idle_pg_table[ROOT_PAGETABLE_FIRST_XEN_SLOT],
+           root_pgt_pv_xen_slots * sizeof(l4_pgentry_t));
+#ifndef NDEBUG
+    if ( l4e_get_intpte(split_l4e) )
+        l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT + root_pgt_pv_xen_slots] =
+            split_l4e;
+#endif
+    l4tab[l4_table_offset(LINEAR_PT_VIRT_START)] =
+        l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR_RW);
+    l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
+        l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR_RW);
+    if ( zap_ro_mpt || is_pv_32bit_domain(d) )
+        l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
+}
+
+void pv_arch_init_memory(void)
+{
+#ifndef NDEBUG
+    unsigned int i;
+
+    if ( highmem_start )
+    {
+        unsigned long split_va = (unsigned long)__va(highmem_start);
+
+        if ( split_va < HYPERVISOR_VIRT_END &&
+             split_va - 1 == (unsigned long)__va(highmem_start - 1) )
+        {
+            root_pgt_pv_xen_slots = l4_table_offset(split_va) -
+                                    ROOT_PAGETABLE_FIRST_XEN_SLOT;
+            ASSERT(root_pgt_pv_xen_slots < ROOT_PAGETABLE_PV_XEN_SLOTS);
+            if ( l4_table_offset(split_va) == l4_table_offset(split_va - 1) )
+            {
+                l3_pgentry_t *l3tab = alloc_xen_pagetable();
+
+                if ( l3tab )
+                {
+                    const l3_pgentry_t *l3idle =
+                        l4e_to_l3e(idle_pg_table[l4_table_offset(split_va)]);
+
+                    for ( i = 0; i < l3_table_offset(split_va); ++i )
+                        l3tab[i] = l3idle[i];
+                    for ( ; i < L3_PAGETABLE_ENTRIES; ++i )
+                        l3tab[i] = l3e_empty();
+                    split_l4e = l4e_from_pfn(virt_to_mfn(l3tab),
+                                             __PAGE_HYPERVISOR_RW);
+                }
+                else
+                    ++root_pgt_pv_xen_slots;
+            }
+        }
+    }
+#endif
+}
+
 /*
  * How to write an entry to the guest pagetables.
  * Returns false for failure (pointer not valid), true for success.
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 07287d97ca..19c80da995 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -322,8 +322,6 @@ static inline void *__page_to_virt(const struct page_info *pg)
 int free_page_type(struct page_info *page, unsigned long type,
                    int preemptible);
 
-void init_guest_l4_table(l4_pgentry_t[], const struct domain *,
-                         bool_t zap_ro_mpt);
 bool_t fill_ro_mpt(unsigned long mfn);
 void zap_ro_mpt(unsigned long mfn);
 
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index a71ce934fa..8fd542e630 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -90,6 +90,10 @@ bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
 l1_pgentry_t *pv_map_guest_l1e(unsigned long addr, unsigned long *gl1mfn);
 void pv_unmap_guest_l1e(void *p);
 
+void pv_init_guest_l4_table(l4_pgentry_t[], const struct domain *,
+                            bool zap_ro_mpt);
+void pv_arch_init_memory(void);
+
 #else
 
 static inline void pv_get_guest_eff_l1e(unsigned long addr,
@@ -111,6 +115,10 @@ static inline l1_pgentry_t *pv_map_guest_l1e(unsigned long addr,
 
 static inline void pv_unmap_guest_l1e(void *p) {}
 
+static inline void pv_init_guest_l4_table(l4_pgentry_t[],
+                                          const struct domain *,
+                                          bool zap_ro_mpt) {}
+static inline void pv_arch_init_memory(void) {}
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 21/21] x86/mm: add "pv_" prefix to new_guest_cr3
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (19 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 20/21] x86/mm: move l4 table setup code Wei Liu
@ 2017-07-20 16:04 ` Wei Liu
  2017-07-30  6:26 ` [PATCH v3 00/21] x86: refactor mm.c (the easy part) Jan Beulich
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
  22 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-20 16:04 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Also take the chance to change d to currd. This function can't be
moved yet. It can only be moved with other functions.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c              | 18 +++++++++---------
 xen/arch/x86/pv/emul-priv-op.c |  3 ++-
 xen/include/asm-x86/mm.h       |  1 -
 xen/include/asm-x86/pv/mm.h    |  7 +++++++
 4 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5687e29824..2493ea7fd3 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2695,14 +2695,14 @@ int vcpu_destroy_pagetables(struct vcpu *v)
     return rc != -EINTR ? rc : -ERESTART;
 }
 
-int new_guest_cr3(unsigned long mfn)
+int pv_new_guest_cr3(unsigned long mfn)
 {
     struct vcpu *curr = current;
-    struct domain *d = curr->domain;
+    struct domain *currd = curr->domain;
     int rc;
     unsigned long old_base_mfn;
 
-    if ( is_pv_32bit_domain(d) )
+    if ( is_pv_32bit_domain(currd) )
     {
         unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
         l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
@@ -2748,9 +2748,9 @@ int new_guest_cr3(unsigned long mfn)
         return 0;
     }
 
-    rc = paging_mode_refcounts(d)
-         ? (!get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
-         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
+    rc = paging_mode_refcounts(currd)
+         ? (!get_page_from_pagenr(mfn, currd) ? 0 : -EINVAL)
+         : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, currd, 0, 1);
     switch ( rc )
     {
     case 0:
@@ -2766,7 +2766,7 @@ int new_guest_cr3(unsigned long mfn)
 
     invalidate_shadow_ldt(curr, 0);
 
-    if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+    if ( !VM_ASSIST(currd, m2p_strict) && !paging_mode_refcounts(currd) )
         fill_ro_mpt(mfn);
     curr->arch.guest_table = pagetable_from_pfn(mfn);
     update_cr3(curr);
@@ -2777,7 +2777,7 @@ int new_guest_cr3(unsigned long mfn)
     {
         struct page_info *page = mfn_to_page(old_base_mfn);
 
-        if ( paging_mode_refcounts(d) )
+        if ( paging_mode_refcounts(currd) )
             put_page(page);
         else
             switch ( rc = put_page_and_type_preemptible(page) )
@@ -3102,7 +3102,7 @@ long do_mmuext_op(
             else if ( unlikely(paging_mode_translate(currd)) )
                 rc = -EINVAL;
             else
-                rc = new_guest_cr3(op.arg1.mfn);
+                rc = pv_new_guest_cr3(op.arg1.mfn);
             break;
 
         case MMUEXT_NEW_USER_BASEPTR: {
diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
index 85185b6b29..936757e03c 100644
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -32,6 +32,7 @@
 #include <asm/hypercall.h>
 #include <asm/mc146818rtc.h>
 #include <asm/p2m.h>
+#include <asm/pv/mm.h>
 #include <asm/pv/traps.h>
 #include <asm/shared.h>
 #include <asm/traps.h>
@@ -768,7 +769,7 @@ static int priv_op_write_cr(unsigned int reg, unsigned long val,
         page = get_page_from_gfn(currd, gfn, NULL, P2M_ALLOC);
         if ( !page )
             break;
-        rc = new_guest_cr3(page_to_mfn(page));
+        rc = pv_new_guest_cr3(page_to_mfn(page));
         put_page(page);
 
         switch ( rc )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 19c80da995..f48ce7555d 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -534,7 +534,6 @@ void audit_domains(void);
 
 #endif
 
-int new_guest_cr3(unsigned long pfn);
 void make_cr3(struct vcpu *v, unsigned long mfn);
 void update_cr3(struct vcpu *v);
 int vcpu_destroy_pagetables(struct vcpu *);
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 8fd542e630..0192580b41 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -94,8 +94,12 @@ void pv_init_guest_l4_table(l4_pgentry_t[], const struct domain *,
                             bool zap_ro_mpt);
 void pv_arch_init_memory(void);
 
+int pv_new_guest_cr3(unsigned long pfn);
+
 #else
 
+#include <xen/errno.h>
+
 static inline void pv_get_guest_eff_l1e(unsigned long addr,
                                         l1_pgentry_t *eff_l1e)
 {}
@@ -119,6 +123,9 @@ static inline void pv_init_guest_l4_table(l4_pgentry_t[],
                                           const struct domain *,
                                           bool zap_ro_mpt) {}
 static inline void pv_arch_init_memory(void) {}
+
+static inline int pv_new_guest_cr3(unsigned long pfn) { return -EINVAL; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/21] x86: refactor mm.c (the easy part)
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (20 preceding siblings ...)
  2017-07-20 16:04 ` [PATCH v3 21/21] x86/mm: add "pv_" prefix to new_guest_cr3 Wei Liu
@ 2017-07-30  6:26 ` Jan Beulich
  2017-07-30  9:23   ` Wei Liu
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
  22 siblings, 1 reply; 39+ messages in thread
From: Jan Beulich @ 2017-07-30  6:26 UTC (permalink / raw)
  To: wei.liu2; +Cc: george.dunlap, andrew.cooper3, xen-devel

>>> Wei Liu <wei.liu2@citrix.com> 07/20/17 6:04 PM >>>
>What is left is mostly PV MMU hypercall functions and their supporting code.
>I'm still thinking about how to refactor those because the helper functions are
>a bit convulted. The helper functions are both used by PV MMU code and the
>common get / put functions. I think I need to refactor the get / put functions.
>If you think there is a better approach please let me know.

Wouldn't it, for example, be possible to simply move {alloc,free}_page_type()
to pv-specific code, providing stubs for the !PV case?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 00/21] x86: refactor mm.c (the easy part)
  2017-07-30  6:26 ` [PATCH v3 00/21] x86: refactor mm.c (the easy part) Jan Beulich
@ 2017-07-30  9:23   ` Wei Liu
  0 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30  9:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: george.dunlap, andrew.cooper3, wei.liu2, xen-devel

On Sun, Jul 30, 2017 at 12:26:59AM -0600, Jan Beulich wrote:
> >>> Wei Liu <wei.liu2@citrix.com> 07/20/17 6:04 PM >>>
> >What is left is mostly PV MMU hypercall functions and their supporting code.
> >I'm still thinking about how to refactor those because the helper functions are
> >a bit convulted. The helper functions are both used by PV MMU code and the
> >common get / put functions. I think I need to refactor the get / put functions.
> >If you think there is a better approach please let me know.
> 
> Wouldn't it, for example, be possible to simply move {alloc,free}_page_type()
> to pv-specific code, providing stubs for the !PV case?
> 

Yes, that's one of the easier ways of doing it. And I'm inclined at this
point to do that.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls
  2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
                   ` (21 preceding siblings ...)
  2017-07-30  6:26 ` [PATCH v3 00/21] x86: refactor mm.c (the easy part) Jan Beulich
@ 2017-07-30 15:43 ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 01/11] x86: add pv_ prefix to {alloc, free}_page_type Wei Liu
                     ` (11 more replies)
  22 siblings, 12 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

This series is built on top of the "easy part" [0] (and rebased on top of
current staging).

After discussing with George and Andrew on IRC, it is clear that
alloc_page_type and free_page_type are only useful for PV guests. This
immediately enables us to move them and the supporting code to PV directory.

Note that in the stubs I choose to return EINVAL but maybe we should just BUG()
there because those paths aren't supposed to be taken when !CONFIG_PV. And I'm
sure common code will BUG_ON() or BUG() sooner or later. Thoughts?

PV MMU hypercalls are moved to mm-hypercalls.c to avoid having a very huge
pv/mm.c.

Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>

Wei Liu (11):
  x86: add pv_ prefix to {alloc,free}_page_type
  x86/mm: export more get/put page functions
  x86/mm: move and add pv_ prefix to create_pae_xen_mappings
  x86/mm: move disallow_mask variable and macros
  x86/mm: move pv_{alloc,free}_page_type
  x86/mm: move and add pv_ prefix to invalidate_shadow_ldt
  x86/mm: move PV hypercalls to pv/mm-hypercalls.c
  x86/mm: remove the now unused inclusion of pv/mm.h
  x86/mm: use put_page_type_preemptible in put_page_from_l{2,3}e
  x86/mm: move {get,put}_page_from_l{2,3,4}e
  x86/mm: move description of x86 page table API to pv/mm.c

 xen/arch/x86/domain.c           |    3 +-
 xen/arch/x86/mm.c               | 3018 +++++----------------------------------
 xen/arch/x86/pv/Makefile        |    1 +
 xen/arch/x86/pv/mm-hypercalls.c | 1461 +++++++++++++++++++
 xen/arch/x86/pv/mm.c            |  877 ++++++++++++
 xen/arch/x86/pv/mm.h            |    6 +
 xen/include/asm-x86/mm.h        |   29 +-
 xen/include/asm-x86/pv/mm.h     |   45 +
 8 files changed, 2759 insertions(+), 2681 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm-hypercalls.c
 create mode 100644 xen/arch/x86/pv/mm.h

-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 01/11] x86: add pv_ prefix to {alloc, free}_page_type
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 02/11] x86/mm: export more get/put page functions Wei Liu
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

They are only useful for PV guests. Also change preemptible to bool.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c    |  2 +-
 xen/arch/x86/mm.c        | 12 ++++++------
 xen/include/asm-x86/mm.h |  4 ++--
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 93014d9bbc..d92a930d29 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1807,7 +1807,7 @@ static int relinquish_memory(
             if ( likely(y == x) )
             {
                 /* No need for atomic update of type_info here: noone else updates it. */
-                switch ( ret = free_page_type(page, x, 1) )
+                switch ( ret = pv_free_page_type(page, x, true) )
                 {
                 case 0:
                     break;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9b6871ab04..a908d70dea 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2010,8 +2010,8 @@ static void get_page_light(struct page_info *page)
     while ( unlikely(y != x) );
 }
 
-static int alloc_page_type(struct page_info *page, unsigned long type,
-                           int preemptible)
+static int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                              bool preemptible)
 {
     struct domain *owner = page_get_owner(page);
     int rc;
@@ -2083,8 +2083,8 @@ static int alloc_page_type(struct page_info *page, unsigned long type,
 }
 
 
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible)
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      bool preemptible)
 {
     struct domain *owner = page_get_owner(page);
     unsigned long gmfn;
@@ -2141,7 +2141,7 @@ int free_page_type(struct page_info *page, unsigned long type,
 static int __put_final_page_type(
     struct page_info *page, unsigned long type, int preemptible)
 {
-    int rc = free_page_type(page, type, preemptible);
+    int rc = pv_free_page_type(page, type, preemptible);
 
     /* No need for atomic update of type_info here: noone else updates it. */
     if ( rc == 0 )
@@ -2357,7 +2357,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
             page->nr_validated_ptes = 0;
             page->partial_pte = 0;
         }
-        rc = alloc_page_type(page, type, preemptible);
+        rc = pv_alloc_page_type(page, type, preemptible);
     }
 
     if ( (x & PGT_partial) && !(nx & PGT_partial) )
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 521a8b1b7b..a5662f327b 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -302,8 +302,8 @@ static inline void *__page_to_virt(const struct page_info *pg)
                     (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg))));
 }
 
-int free_page_type(struct page_info *page, unsigned long type,
-                   int preemptible);
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      bool preemptible);
 
 bool_t fill_ro_mpt(unsigned long mfn);
 void zap_ro_mpt(unsigned long mfn);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 02/11] x86/mm: export more get/put page functions
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 01/11] x86: add pv_ prefix to {alloc, free}_page_type Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 03/11] x86/mm: move and add pv_ prefix to create_pae_xen_mappings Wei Liu
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Export some of the get/put functions so that we can move PV mm code
trunk by trunk.

When moving code is done some of the functions might be made static
again.

Also fix coding style issues and use bool when appropriate.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 40 ++++++++++++++++++++--------------------
 xen/include/asm-x86/mm.h | 17 +++++++++++++++--
 2 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a908d70dea..40f9ad9c98 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -559,9 +559,8 @@ bool get_page_from_mfn(mfn_t mfn, struct domain *d)
 }
 
 
-static int get_page_and_type_from_mfn(
-    mfn_t mfn, unsigned long type, struct domain *d,
-    int partial, int preemptible)
+int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
+                               int partial, bool preemptible)
 {
     struct page_info *page = mfn_to_page(mfn_x(mfn));
     int rc;
@@ -944,7 +943,7 @@ get_page_from_l1e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l2);
-static int
+int
 get_page_from_l2e(
     l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
 {
@@ -963,7 +962,8 @@ get_page_from_l2e(
 
     if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
     {
-        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0, 0);
+        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0,
+                                        false);
         if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
             rc = 0;
         return rc;
@@ -980,7 +980,7 @@ get_page_from_l2e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l3);
-static int
+int
 get_page_from_l3e(
     l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
 {
@@ -996,8 +996,8 @@ get_page_from_l3e(
         return -EINVAL;
     }
 
-    rc = get_page_and_type_from_mfn(
-        _mfn(l3e_get_pfn(l3e)), PGT_l2_page_table, d, partial, 1);
+    rc = get_page_and_type_from_mfn(_mfn(l3e_get_pfn(l3e)), PGT_l2_page_table,
+                                    d, partial, true);
     if ( unlikely(rc == -EINVAL) &&
          !is_pv_32bit_domain(d) &&
          get_l3_linear_pagetable(l3e, pfn, d) )
@@ -1013,7 +1013,7 @@ get_page_from_l3e(
  *  <0 => error code
  */
 define_get_linear_pagetable(l4);
-static int
+int
 get_page_from_l4e(
     l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
 {
@@ -1029,8 +1029,8 @@ get_page_from_l4e(
         return -EINVAL;
     }
 
-    rc = get_page_and_type_from_mfn(
-        _mfn(l4e_get_pfn(l4e)), PGT_l3_page_table, d, partial, 1);
+    rc = get_page_and_type_from_mfn(_mfn(l4e_get_pfn(l4e)), PGT_l3_page_table,
+                                    d, partial, true);
     if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
         rc = 0;
 
@@ -1101,7 +1101,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
  * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
  * Note also that this automatically deals correctly with linear p.t.'s.
  */
-static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
 {
     if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
         return 1;
@@ -1121,8 +1121,8 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
 
 static int __put_page_type(struct page_info *, int preemptible);
 
-static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
-                             int partial, bool defer)
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer)
 {
     struct page_info *pg;
 
@@ -1159,8 +1159,8 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
     return put_page_and_type_preemptible(pg);
 }
 
-static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
-                             int partial, bool defer)
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer)
 {
     if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
          (l4e_get_pfn(l4e) != pfn) )
@@ -1344,7 +1344,7 @@ static int alloc_l3_table(struct page_info *page)
             else
                 rc = get_page_and_type_from_mfn(
                     _mfn(l3e_get_pfn(pl3e[i])),
-                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, 1);
+                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, true);
         }
         else if ( !is_guest_l3_slot(i) ||
                   (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
@@ -1996,7 +1996,7 @@ int get_page(struct page_info *page, struct domain *domain)
  *   acquired reference again.
  * Due to get_page() reserving one reference, this call cannot fail.
  */
-static void get_page_light(struct page_info *page)
+void get_page_light(struct page_info *page)
 {
     unsigned long x, nx, y = page->count_info;
 
@@ -2529,7 +2529,7 @@ int pv_new_guest_cr3(unsigned long mfn)
     rc = paging_mode_refcounts(currd)
          ? (get_page_from_mfn(_mfn(mfn), currd) ? 0 : -EINVAL)
          : get_page_and_type_from_mfn(_mfn(mfn), PGT_root_page_table,
-                                      currd, 0, 1);
+                                      currd, 0, true);
     switch ( rc )
     {
     case 0:
@@ -2905,7 +2905,7 @@ long do_mmuext_op(
             if ( op.arg1.mfn != 0 )
             {
                 rc = get_page_and_type_from_mfn(
-                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, 1);
+                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, true);
 
                 if ( unlikely(rc) )
                 {
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index a5662f327b..07d4c06fc3 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -339,10 +339,23 @@ int  get_page_type(struct page_info *page, unsigned long type);
 int  put_page_type_preemptible(struct page_info *page);
 int  get_page_type_preemptible(struct page_info *page, unsigned long type);
 int  put_old_guest_table(struct vcpu *);
-int  get_page_from_l1e(
-    l1_pgentry_t l1e, struct domain *l1e_owner, struct domain *pg_owner);
+int  get_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner,
+                       struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d);
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer);
+void get_page_light(struct page_info *page);
 bool get_page_from_mfn(mfn_t mfn, struct domain *d);
+int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
+                               int partial, bool preemptible);
 
 static inline void put_page_and_type(struct page_info *page)
 {
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 03/11] x86/mm: move and add pv_ prefix to create_pae_xen_mappings
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 01/11] x86: add pv_ prefix to {alloc, free}_page_type Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 02/11] x86/mm: export more get/put page functions Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 04/11] x86/mm: move disallow_mask variable and macros Wei Liu
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

And export it via a local header because it is going to be used by
several PV specific files.

Take the chance to change its return type to bool.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 46 ++++------------------------------------------
 xen/arch/x86/pv/mm.c | 40 ++++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/pv/mm.h |  6 ++++++
 3 files changed, 50 insertions(+), 42 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm.h

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 40f9ad9c98..0c6a6de1a9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,6 +127,8 @@
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
+#include "pv/mm.h"
+
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
@@ -1223,46 +1225,6 @@ static int alloc_l1_table(struct page_info *page)
     return ret;
 }
 
-static int create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
-{
-    struct page_info *page;
-    l3_pgentry_t     l3e3;
-
-    if ( !is_pv_32bit_domain(d) )
-        return 1;
-
-    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
-
-    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
-    l3e3 = pl3e[3];
-    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
-        return 0;
-    }
-
-    /*
-     * The Xen-private mappings include linear mappings. The L2 thus cannot
-     * be shared by multiple L3 tables. The test here is adequate because:
-     *  1. Cannot appear in slots != 3 because get_page_type() checks the
-     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
-     *  2. Cannot appear in another page table's L3:
-     *     a. alloc_l3_table() calls this function and this check will fail
-     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
-     */
-    page = l3e_get_page(l3e3);
-    BUG_ON(page->u.inuse.type_info & PGT_pinned);
-    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
-    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
-    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
-    {
-        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
-        return 0;
-    }
-
-    return 1;
-}
-
 static int alloc_l2_table(struct page_info *page, unsigned long type,
                           int preemptible)
 {
@@ -1367,7 +1329,7 @@ static int alloc_l3_table(struct page_info *page)
         adjust_guest_l3e(pl3e[i], d);
     }
 
-    if ( rc >= 0 && !create_pae_xen_mappings(d, pl3e) )
+    if ( rc >= 0 && !pv_create_pae_xen_mappings(d, pl3e) )
         rc = -EINVAL;
     if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
     {
@@ -1839,7 +1801,7 @@ static int mod_l3_entry(l3_pgentry_t *pl3e,
     }
 
     if ( likely(rc == 0) )
-        if ( !create_pae_xen_mappings(d, pl3e) )
+        if ( !pv_create_pae_xen_mappings(d, pl3e) )
             BUG();
 
     put_page_from_l3e(ol3e, pfn, 0, 1);
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 0f4303cef2..46e1fcf4e5 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -211,6 +211,46 @@ bool pv_update_intpte(intpte_t *p, intpte_t old, intpte_t new,
     return rv;
 }
 
+bool pv_create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
+{
+    struct page_info *page;
+    l3_pgentry_t     l3e3;
+
+    if ( !is_pv_32bit_domain(d) )
+        return true;
+
+    pl3e = (l3_pgentry_t *)((unsigned long)pl3e & PAGE_MASK);
+
+    /* 3rd L3 slot contains L2 with Xen-private mappings. It *must* exist. */
+    l3e3 = pl3e[3];
+    if ( !(l3e_get_flags(l3e3) & _PAGE_PRESENT) )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is empty\n");
+        return false;
+    }
+
+    /*
+     * The Xen-private mappings include linear mappings. The L2 thus cannot
+     * be shared by multiple L3 tables. The test here is adequate because:
+     *  1. Cannot appear in slots != 3 because get_page_type() checks the
+     *     PGT_pae_xen_l2 flag, which is asserted iff the L2 appears in slot 3
+     *  2. Cannot appear in another page table's L3:
+     *     a. alloc_l3_table() calls this function and this check will fail
+     *     b. mod_l3_entry() disallows updates to slot 3 in an existing table
+     */
+    page = l3e_get_page(l3e3);
+    BUG_ON(page->u.inuse.type_info & PGT_pinned);
+    BUG_ON((page->u.inuse.type_info & PGT_count_mask) == 0);
+    BUG_ON(!(page->u.inuse.type_info & PGT_pae_xen_l2));
+    if ( (page->u.inuse.type_info & PGT_count_mask) != 1 )
+    {
+        gdprintk(XENLOG_WARNING, "PAE L3 3rd slot is shared\n");
+        return false;
+    }
+
+    return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/pv/mm.h b/xen/arch/x86/pv/mm.h
new file mode 100644
index 0000000000..bafc2b6116
--- /dev/null
+++ b/xen/arch/x86/pv/mm.h
@@ -0,0 +1,6 @@
+#ifndef __PV_MM_H__
+#define __PV_MM_H__
+
+bool pv_create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e);
+
+#endif /* __PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 04/11] x86/mm: move disallow_mask variable and macros
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (2 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 03/11] x86/mm: move and add pv_ prefix to create_pae_xen_mappings Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 05/11] x86/mm: move pv_{alloc, free}_page_type Wei Liu
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

They will be used by both common mm code and PV mm code in the next
few patches. Note that they might be moved again later if they aren't
needed by common mm code any more.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c        | 19 +------------------
 xen/include/asm-x86/mm.h | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 0c6a6de1a9..5545a6f4de 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -146,24 +146,7 @@ bool __read_mostly machine_to_phys_mapping_valid;
 
 struct rangeset *__read_mostly mmio_ro_ranges;
 
-static uint32_t base_disallow_mask;
-/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
-#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
-
-#define L2_DISALLOW_MASK base_disallow_mask
-
-#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
-                             base_disallow_mask : 0xFFFFF198U)
-
-#define L4_DISALLOW_MASK (base_disallow_mask)
-
-#define l1_disallow_mask(d)                                     \
-    ((d != dom_io) &&                                           \
-     (rangeset_is_empty((d)->iomem_caps) &&                     \
-      rangeset_is_empty((d)->arch.ioport_caps) &&               \
-      !has_arch_pdevs(d) &&                                     \
-      is_pv_domain(d)) ?                                        \
-     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+uint32_t base_disallow_mask;
 
 static s8 __read_mostly opt_mmio_relax;
 static void __init parse_mmio_relax(const char *s)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 07d4c06fc3..6857651db1 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -334,6 +334,25 @@ const unsigned long *get_platform_badpages(unsigned int *array_size);
 int page_lock(struct page_info *page);
 void page_unlock(struct page_info *page);
 
+extern uint32_t base_disallow_mask;
+/* Global bit is allowed to be set on L1 PTEs. Intended for user mappings. */
+#define L1_DISALLOW_MASK ((base_disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL)
+
+#define L2_DISALLOW_MASK base_disallow_mask
+
+#define l3_disallow_mask(d) (!is_pv_32bit_domain(d) ? \
+                             base_disallow_mask : 0xFFFFF198U)
+
+#define L4_DISALLOW_MASK (base_disallow_mask)
+
+#define l1_disallow_mask(d)                                     \
+    ((d != dom_io) &&                                           \
+     (rangeset_is_empty((d)->iomem_caps) &&                     \
+      rangeset_is_empty((d)->arch.ioport_caps) &&               \
+      !has_arch_pdevs(d) &&                                     \
+      is_pv_domain(d)) ?                                        \
+     L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
+
 void put_page_type(struct page_info *page);
 int  get_page_type(struct page_info *page, unsigned long type);
 int  put_page_type_preemptible(struct page_info *page);
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 05/11] x86/mm: move pv_{alloc, free}_page_type
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (3 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 04/11] x86/mm: move disallow_mask variable and macros Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 06/11] x86/mm: move and add pv_ prefix to invalidate_shadow_ldt Wei Liu
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move them and the helper functions to pv/mm.c.  Use bool in the moved
code where appropriate.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/domain.c       |   1 +
 xen/arch/x86/mm.c           | 492 --------------------------------------------
 xen/arch/x86/pv/mm.c        | 491 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h    |   3 -
 xen/include/asm-x86/pv/mm.h |  12 ++
 5 files changed, 504 insertions(+), 495 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d92a930d29..36225631eb 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -64,6 +64,7 @@
 #include <compat/vcpu.h>
 #include <asm/psr.h>
 #include <asm/pv/domain.h>
+#include <asm/pv/mm.h>
 #include <asm/pv/processor.h>
 
 DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5545a6f4de..ac0e0ba346 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -514,21 +514,6 @@ static void invalidate_shadow_ldt(struct vcpu *v, int flush)
 }
 
 
-static int alloc_segdesc_page(struct page_info *page)
-{
-    const struct domain *owner = page_get_owner(page);
-    struct desc_struct *descs = __map_domain_page(page);
-    unsigned i;
-
-    for ( i = 0; i < 512; i++ )
-        if ( unlikely(!check_descriptor(owner, &descs[i])) )
-            break;
-
-    unmap_domain_page(descs);
-
-    return i == 512 ? 0 : -EINVAL;
-}
-
 bool get_page_from_mfn(mfn_t mfn, struct domain *d)
 {
     struct page_info *page = mfn_to_page(mfn_x(mfn));
@@ -543,7 +528,6 @@ bool get_page_from_mfn(mfn_t mfn, struct domain *d)
     return true;
 }
 
-
 int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
                                int partial, bool preemptible)
 {
@@ -1169,172 +1153,6 @@ int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
     return 1;
 }
 
-static int alloc_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l1_pgentry_t  *pl1e;
-    unsigned int   i;
-    int            ret = 0;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-    {
-        switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
-        {
-        default:
-            goto fail;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(pl1e[i], ret);
-            break;
-        }
-
-        adjust_guest_l1e(pl1e[i], d);
-    }
-
-    unmap_domain_page(pl1e);
-    return 0;
-
- fail:
-    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
-    while ( i-- > 0 )
-        put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-    return ret;
-}
-
-static int alloc_l2_table(struct page_info *page, unsigned long type,
-                          int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l2_pgentry_t  *pl2e;
-    unsigned int   i;
-    int            rc = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
-    {
-        if ( preemptible && i > page->nr_validated_ptes
-             && hypercall_preempt_check() )
-        {
-            page->nr_validated_ptes = i;
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( !is_guest_l2_slot(d, type, i) ||
-             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
-            continue;
-
-        if ( rc < 0 )
-        {
-            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
-            while ( i-- > 0 )
-                if ( is_guest_l2_slot(d, type, i) )
-                    put_page_from_l2e(pl2e[i], pfn);
-            break;
-        }
-
-        adjust_guest_l2e(pl2e[i], d);
-    }
-
-    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
-    {
-        /* Xen private mappings. */
-        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
-               &compat_idle_pg_table_l2[
-                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
-               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
-    }
-
-    unmap_domain_page(pl2e);
-    return rc > 0 ? 0 : rc;
-}
-
-static int alloc_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l3_pgentry_t  *pl3e;
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    /*
-     * PAE guests allocate full pages, but aren't required to initialize
-     * more than the first four entries; when running in compatibility
-     * mode, however, the full page is visible to the MMU, and hence all
-     * 512 entries must be valid/verified, which is most easily achieved
-     * by clearing them out.
-     */
-    if ( is_pv_32bit_domain(d) )
-        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
-
-    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( is_pv_32bit_domain(d) && (i == 3) )
-        {
-            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
-                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
-                rc = -EINVAL;
-            else
-                rc = get_page_and_type_from_mfn(
-                    _mfn(l3e_get_pfn(pl3e[i])),
-                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, true);
-        }
-        else if ( !is_guest_l3_slot(i) ||
-                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc == -EINTR && i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            rc = -ERESTART;
-        }
-        if ( rc < 0 )
-            break;
-
-        adjust_guest_l3e(pl3e[i], d);
-    }
-
-    if ( rc >= 0 && !pv_create_pae_xen_mappings(d, pl3e) )
-        rc = -EINVAL;
-    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
-    {
-        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
-        if ( i )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = 0;
-            current->arch.old_guest_table = page;
-        }
-        while ( i-- > 0 )
-        {
-            if ( !is_guest_l3_slot(i) )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    }
-
-    unmap_domain_page(pl3e);
-    return rc > 0 ? 0 : rc;
-}
-
 bool fill_ro_mpt(unsigned long mfn)
 {
     l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
@@ -1359,188 +1177,6 @@ void zap_ro_mpt(unsigned long mfn)
     unmap_domain_page(l4tab);
 }
 
-static int alloc_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long  pfn = page_to_mfn(page);
-    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
-    unsigned int   i;
-    int            rc = 0, partial = page->partial_pte;
-
-    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
-          i++, partial = 0 )
-    {
-        if ( !is_guest_l4_slot(d, i) ||
-             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
-            continue;
-
-        if ( rc == -ERESTART )
-        {
-            page->nr_validated_ptes = i;
-            page->partial_pte = partial ?: 1;
-        }
-        else if ( rc < 0 )
-        {
-            if ( rc != -EINTR )
-                gdprintk(XENLOG_WARNING,
-                         "Failure in alloc_l4_table: slot %#x\n", i);
-            if ( i )
-            {
-                page->nr_validated_ptes = i;
-                page->partial_pte = 0;
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else
-                {
-                    if ( current->arch.old_guest_table )
-                        page->nr_validated_ptes++;
-                    current->arch.old_guest_table = page;
-                }
-            }
-        }
-        if ( rc < 0 )
-        {
-            unmap_domain_page(pl4e);
-            return rc;
-        }
-
-        adjust_guest_l4e(pl4e[i], d);
-    }
-
-    if ( rc >= 0 )
-    {
-        pv_init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
-        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-    unmap_domain_page(pl4e);
-
-    return rc;
-}
-
-static void free_l1_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l1_pgentry_t *pl1e;
-    unsigned int  i;
-
-    pl1e = map_domain_page(_mfn(pfn));
-
-    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
-        put_page_from_l1e(pl1e[i], d);
-
-    unmap_domain_page(pl1e);
-}
-
-
-static int free_l2_table(struct page_info *page, int preemptible)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l2_pgentry_t *pl2e;
-    unsigned int  i = page->nr_validated_ptes - 1;
-    int err = 0;
-
-    pl2e = map_domain_page(_mfn(pfn));
-
-    ASSERT(page->nr_validated_ptes);
-    do {
-        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
-             put_page_from_l2e(pl2e[i], pfn) == 0 &&
-             preemptible && i && hypercall_preempt_check() )
-        {
-           page->nr_validated_ptes = i;
-           err = -ERESTART;
-        }
-    } while ( !err && i-- );
-
-    unmap_domain_page(pl2e);
-
-    if ( !err )
-        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
-
-    return err;
-}
-
-static int free_l3_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l3_pgentry_t *pl3e;
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    pl3e = map_domain_page(_mfn(pfn));
-
-    do {
-        if ( is_guest_l3_slot(i) )
-        {
-            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
-            if ( rc < 0 )
-                break;
-            partial = 0;
-            if ( rc > 0 )
-                continue;
-            unadjust_guest_l3e(pl3e[i], d);
-        }
-    } while ( i-- );
-
-    unmap_domain_page(pl3e);
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-    return rc > 0 ? 0 : rc;
-}
-
-static int free_l4_table(struct page_info *page)
-{
-    struct domain *d = page_get_owner(page);
-    unsigned long pfn = page_to_mfn(page);
-    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
-    int rc = 0, partial = page->partial_pte;
-    unsigned int  i = page->nr_validated_ptes - !partial;
-
-    do {
-        if ( is_guest_l4_slot(d, i) )
-            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
-        if ( rc < 0 )
-            break;
-        partial = 0;
-    } while ( i-- );
-
-    if ( rc == -ERESTART )
-    {
-        page->nr_validated_ptes = i;
-        page->partial_pte = partial ?: -1;
-    }
-    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
-    {
-        page->nr_validated_ptes = i + 1;
-        page->partial_pte = 0;
-        rc = -ERESTART;
-    }
-
-    unmap_domain_page(pl4e);
-
-    if ( rc >= 0 )
-    {
-        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
-        rc = 0;
-    }
-
-    return rc;
-}
-
 int page_lock(struct page_info *page)
 {
     unsigned long x, nx;
@@ -1955,134 +1591,6 @@ void get_page_light(struct page_info *page)
     while ( unlikely(y != x) );
 }
 
-static int pv_alloc_page_type(struct page_info *page, unsigned long type,
-                              bool preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    int rc;
-
-    /* A page table is dirtied when its type count becomes non-zero. */
-    if ( likely(owner != NULL) )
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        rc = alloc_l1_table(page);
-        break;
-    case PGT_l2_page_table:
-        rc = alloc_l2_table(page, type, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = alloc_l4_table(page);
-        break;
-    case PGT_seg_desc_page:
-        rc = alloc_segdesc_page(page);
-        break;
-    default:
-        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n",
-               type, page->u.inuse.type_info,
-               page->count_info);
-        rc = -EINVAL;
-        BUG();
-    }
-
-    /* No need for atomic update of type_info here: noone else updates it. */
-    smp_wmb();
-    switch ( rc )
-    {
-    case 0:
-        page->u.inuse.type_info |= PGT_validated;
-        break;
-    case -EINTR:
-        ASSERT((page->u.inuse.type_info &
-                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
-        page->u.inuse.type_info &= ~PGT_count_mask;
-        break;
-    default:
-        ASSERT(rc < 0);
-        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
-                 " (pfn %" PRI_pfn ") for type %" PRtype_info
-                 ": caf=%08lx taf=%" PRtype_info "\n",
-                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
-                 type, page->count_info, page->u.inuse.type_info);
-        if ( page != current->arch.old_guest_table )
-            page->u.inuse.type_info = 0;
-        else
-        {
-            ASSERT((page->u.inuse.type_info &
-                    (PGT_count_mask | PGT_validated)) == 1);
-    case -ERESTART:
-            get_page_light(page);
-            page->u.inuse.type_info |= PGT_partial;
-        }
-        break;
-    }
-
-    return rc;
-}
-
-
-int pv_free_page_type(struct page_info *page, unsigned long type,
-                      bool preemptible)
-{
-    struct domain *owner = page_get_owner(page);
-    unsigned long gmfn;
-    int rc;
-
-    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
-    {
-        /* A page table is dirtied when its type count becomes zero. */
-        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
-
-        ASSERT(!shadow_mode_refcounts(owner));
-
-        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
-        ASSERT(VALID_M2P(gmfn));
-        /* Page sharing not supported for shadowed domains */
-        if(!SHARED_M2P(gmfn))
-            shadow_remove_all_shadows(owner, _mfn(gmfn));
-    }
-
-    if ( !(type & PGT_partial) )
-    {
-        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
-        page->partial_pte = 0;
-    }
-
-    switch ( type & PGT_type_mask )
-    {
-    case PGT_l1_page_table:
-        free_l1_table(page);
-        rc = 0;
-        break;
-    case PGT_l2_page_table:
-        rc = free_l2_table(page, preemptible);
-        break;
-    case PGT_l3_page_table:
-        ASSERT(preemptible);
-        rc = free_l3_table(page);
-        break;
-    case PGT_l4_page_table:
-        ASSERT(preemptible);
-        rc = free_l4_table(page);
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
-                 type, page_to_mfn(page));
-        rc = -EINVAL;
-        BUG();
-    }
-
-    return rc;
-}
-
-
 static int __put_final_page_type(
     struct page_info *page, unsigned long type, int preemptible)
 {
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 46e1fcf4e5..f0393b9e3c 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -20,10 +20,13 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/event.h>
 #include <xen/guest_access.h>
 
+#include <asm/mm.h>
 #include <asm/pv/mm.h>
 #include <asm/setup.h>
+#include <asm/shadow.h>
 
 /*
  * PTE updates can be done with ordinary writes except:
@@ -251,6 +254,494 @@ bool pv_create_pae_xen_mappings(struct domain *d, l3_pgentry_t *pl3e)
     return true;
 }
 
+static int alloc_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l1_pgentry_t  *pl1e;
+    unsigned int   i;
+    int            ret = 0;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+    {
+        switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
+        {
+        default:
+            goto fail;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(pl1e[i], ret);
+            break;
+        }
+
+        adjust_guest_l1e(pl1e[i], d);
+    }
+
+    unmap_domain_page(pl1e);
+    return 0;
+
+ fail:
+    gdprintk(XENLOG_WARNING, "Failure in alloc_l1_table: slot %#x\n", i);
+    while ( i-- > 0 )
+        put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+    return ret;
+}
+
+static int alloc_l2_table(struct page_info *page, unsigned long type,
+                          bool preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l2_pgentry_t  *pl2e;
+    unsigned int   i;
+    int            rc = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    for ( i = page->nr_validated_ptes; i < L2_PAGETABLE_ENTRIES; i++ )
+    {
+        if ( preemptible && i > page->nr_validated_ptes
+             && hypercall_preempt_check() )
+        {
+            page->nr_validated_ptes = i;
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( !is_guest_l2_slot(d, type, i) ||
+             (rc = get_page_from_l2e(pl2e[i], pfn, d)) > 0 )
+            continue;
+
+        if ( rc < 0 )
+        {
+            gdprintk(XENLOG_WARNING, "Failure in alloc_l2_table: slot %#x\n", i);
+            while ( i-- > 0 )
+                if ( is_guest_l2_slot(d, type, i) )
+                    put_page_from_l2e(pl2e[i], pfn);
+            break;
+        }
+
+        adjust_guest_l2e(pl2e[i], d);
+    }
+
+    if ( rc >= 0 && (type & PGT_pae_xen_l2) )
+    {
+        /* Xen private mappings. */
+        memcpy(&pl2e[COMPAT_L2_PAGETABLE_FIRST_XEN_SLOT(d)],
+               &compat_idle_pg_table_l2[
+                   l2_table_offset(HIRO_COMPAT_MPT_VIRT_START)],
+               COMPAT_L2_PAGETABLE_XEN_SLOTS(d) * sizeof(*pl2e));
+    }
+
+    unmap_domain_page(pl2e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l3_pgentry_t  *pl3e;
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    /*
+     * PAE guests allocate full pages, but aren't required to initialize
+     * more than the first four entries; when running in compatibility
+     * mode, however, the full page is visible to the MMU, and hence all
+     * 512 entries must be valid/verified, which is most easily achieved
+     * by clearing them out.
+     */
+    if ( is_pv_32bit_domain(d) )
+        memset(pl3e + 4, 0, (L3_PAGETABLE_ENTRIES - 4) * sizeof(*pl3e));
+
+    for ( i = page->nr_validated_ptes; i < L3_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( is_pv_32bit_domain(d) && (i == 3) )
+        {
+            if ( !(l3e_get_flags(pl3e[i]) & _PAGE_PRESENT) ||
+                 (l3e_get_flags(pl3e[i]) & l3_disallow_mask(d)) )
+                rc = -EINVAL;
+            else
+                rc = get_page_and_type_from_mfn(
+                    _mfn(l3e_get_pfn(pl3e[i])),
+                    PGT_l2_page_table | PGT_pae_xen_l2, d, partial, true);
+        }
+        else if ( !is_guest_l3_slot(i) ||
+                  (rc = get_page_from_l3e(pl3e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc == -EINTR && i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            rc = -ERESTART;
+        }
+        if ( rc < 0 )
+            break;
+
+        adjust_guest_l3e(pl3e[i], d);
+    }
+
+    if ( rc >= 0 && !pv_create_pae_xen_mappings(d, pl3e) )
+        rc = -EINVAL;
+    if ( rc < 0 && rc != -ERESTART && rc != -EINTR )
+    {
+        gdprintk(XENLOG_WARNING, "Failure in alloc_l3_table: slot %#x\n", i);
+        if ( i )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = 0;
+            current->arch.old_guest_table = page;
+        }
+        while ( i-- > 0 )
+        {
+            if ( !is_guest_l3_slot(i) )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    }
+
+    unmap_domain_page(pl3e);
+    return rc > 0 ? 0 : rc;
+}
+
+static int alloc_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long  pfn = page_to_mfn(page);
+    l4_pgentry_t  *pl4e = map_domain_page(_mfn(pfn));
+    unsigned int   i;
+    int            rc = 0, partial = page->partial_pte;
+
+    for ( i = page->nr_validated_ptes; i < L4_PAGETABLE_ENTRIES;
+          i++, partial = 0 )
+    {
+        if ( !is_guest_l4_slot(d, i) ||
+             (rc = get_page_from_l4e(pl4e[i], pfn, d, partial)) > 0 )
+            continue;
+
+        if ( rc == -ERESTART )
+        {
+            page->nr_validated_ptes = i;
+            page->partial_pte = partial ?: 1;
+        }
+        else if ( rc < 0 )
+        {
+            if ( rc != -EINTR )
+                gdprintk(XENLOG_WARNING,
+                         "Failure in alloc_l4_table: slot %#x\n", i);
+            if ( i )
+            {
+                page->nr_validated_ptes = i;
+                page->partial_pte = 0;
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else
+                {
+                    if ( current->arch.old_guest_table )
+                        page->nr_validated_ptes++;
+                    current->arch.old_guest_table = page;
+                }
+            }
+        }
+        if ( rc < 0 )
+        {
+            unmap_domain_page(pl4e);
+            return rc;
+        }
+
+        adjust_guest_l4e(pl4e[i], d);
+    }
+
+    if ( rc >= 0 )
+    {
+        pv_init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
+        atomic_inc(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+    unmap_domain_page(pl4e);
+
+    return rc;
+}
+
+static void free_l1_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l1_pgentry_t *pl1e;
+    unsigned int  i;
+
+    pl1e = map_domain_page(_mfn(pfn));
+
+    for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
+        put_page_from_l1e(pl1e[i], d);
+
+    unmap_domain_page(pl1e);
+}
+
+static int free_l2_table(struct page_info *page, int preemptible)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l2_pgentry_t *pl2e;
+    unsigned int  i = page->nr_validated_ptes - 1;
+    int err = 0;
+
+    pl2e = map_domain_page(_mfn(pfn));
+
+    ASSERT(page->nr_validated_ptes);
+    do {
+        if ( is_guest_l2_slot(d, page->u.inuse.type_info, i) &&
+             put_page_from_l2e(pl2e[i], pfn) == 0 &&
+             preemptible && i && hypercall_preempt_check() )
+        {
+           page->nr_validated_ptes = i;
+           err = -ERESTART;
+        }
+    } while ( !err && i-- );
+
+    unmap_domain_page(pl2e);
+
+    if ( !err )
+        page->u.inuse.type_info &= ~PGT_pae_xen_l2;
+
+    return err;
+}
+
+static int free_l3_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l3_pgentry_t *pl3e;
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    pl3e = map_domain_page(_mfn(pfn));
+
+    do {
+        if ( is_guest_l3_slot(i) )
+        {
+            rc = put_page_from_l3e(pl3e[i], pfn, partial, 0);
+            if ( rc < 0 )
+                break;
+            partial = 0;
+            if ( rc > 0 )
+                continue;
+            unadjust_guest_l3e(pl3e[i], d);
+        }
+    } while ( i-- );
+
+    unmap_domain_page(pl3e);
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L3_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+    return rc > 0 ? 0 : rc;
+}
+
+static int free_l4_table(struct page_info *page)
+{
+    struct domain *d = page_get_owner(page);
+    unsigned long pfn = page_to_mfn(page);
+    l4_pgentry_t *pl4e = map_domain_page(_mfn(pfn));
+    int rc = 0, partial = page->partial_pte;
+    unsigned int  i = page->nr_validated_ptes - !partial;
+
+    do {
+        if ( is_guest_l4_slot(d, i) )
+            rc = put_page_from_l4e(pl4e[i], pfn, partial, 0);
+        if ( rc < 0 )
+            break;
+        partial = 0;
+    } while ( i-- );
+
+    if ( rc == -ERESTART )
+    {
+        page->nr_validated_ptes = i;
+        page->partial_pte = partial ?: -1;
+    }
+    else if ( rc == -EINTR && i < L4_PAGETABLE_ENTRIES - 1 )
+    {
+        page->nr_validated_ptes = i + 1;
+        page->partial_pte = 0;
+        rc = -ERESTART;
+    }
+
+    unmap_domain_page(pl4e);
+
+    if ( rc >= 0 )
+    {
+        atomic_dec(&d->arch.pv_domain.nr_l4_pages);
+        rc = 0;
+    }
+
+    return rc;
+}
+
+static int alloc_segdesc_page(struct page_info *page)
+{
+    const struct domain *owner = page_get_owner(page);
+    struct desc_struct *descs = __map_domain_page(page);
+    unsigned i;
+
+    for ( i = 0; i < 512; i++ )
+        if ( unlikely(!check_descriptor(owner, &descs[i])) )
+            break;
+
+    unmap_domain_page(descs);
+
+    return i == 512 ? 0 : -EINVAL;
+}
+
+int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                       bool preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    int rc;
+
+    /* A page table is dirtied when its type count becomes non-zero. */
+    if ( likely(owner != NULL) )
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        rc = alloc_l1_table(page);
+        break;
+    case PGT_l2_page_table:
+        rc = alloc_l2_table(page, type, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = alloc_l4_table(page);
+        break;
+    case PGT_seg_desc_page:
+        rc = alloc_segdesc_page(page);
+        break;
+    default:
+        printk("Bad type in alloc_page_type %lx t=%" PRtype_info " c=%lx\n",
+               type, page->u.inuse.type_info,
+               page->count_info);
+        rc = -EINVAL;
+        BUG();
+    }
+
+    /* No need for atomic update of type_info here: noone else updates it. */
+    smp_wmb();
+    switch ( rc )
+    {
+    case 0:
+        page->u.inuse.type_info |= PGT_validated;
+        break;
+    case -EINTR:
+        ASSERT((page->u.inuse.type_info &
+                (PGT_count_mask|PGT_validated|PGT_partial)) == 1);
+        page->u.inuse.type_info &= ~PGT_count_mask;
+        break;
+    default:
+        ASSERT(rc < 0);
+        gdprintk(XENLOG_WARNING, "Error while validating mfn %" PRI_mfn
+                 " (pfn %" PRI_pfn ") for type %" PRtype_info
+                 ": caf=%08lx taf=%" PRtype_info "\n",
+                 page_to_mfn(page), get_gpfn_from_mfn(page_to_mfn(page)),
+                 type, page->count_info, page->u.inuse.type_info);
+        if ( page != current->arch.old_guest_table )
+            page->u.inuse.type_info = 0;
+        else
+        {
+            ASSERT((page->u.inuse.type_info &
+                    (PGT_count_mask | PGT_validated)) == 1);
+    case -ERESTART:
+            get_page_light(page);
+            page->u.inuse.type_info |= PGT_partial;
+        }
+        break;
+    }
+
+    return rc;
+}
+
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      bool preemptible)
+{
+    struct domain *owner = page_get_owner(page);
+    unsigned long gmfn;
+    int rc;
+
+    if ( likely(owner != NULL) && unlikely(paging_mode_enabled(owner)) )
+    {
+        /* A page table is dirtied when its type count becomes zero. */
+        paging_mark_dirty(owner, _mfn(page_to_mfn(page)));
+
+        ASSERT(!shadow_mode_refcounts(owner));
+
+        gmfn = mfn_to_gmfn(owner, page_to_mfn(page));
+        ASSERT(VALID_M2P(gmfn));
+        /* Page sharing not supported for shadowed domains */
+        if(!SHARED_M2P(gmfn))
+            shadow_remove_all_shadows(owner, _mfn(gmfn));
+    }
+
+    if ( !(type & PGT_partial) )
+    {
+        page->nr_validated_ptes = 1U << PAGETABLE_ORDER;
+        page->partial_pte = 0;
+    }
+
+    switch ( type & PGT_type_mask )
+    {
+    case PGT_l1_page_table:
+        free_l1_table(page);
+        rc = 0;
+        break;
+    case PGT_l2_page_table:
+        rc = free_l2_table(page, preemptible);
+        break;
+    case PGT_l3_page_table:
+        ASSERT(preemptible);
+        rc = free_l3_table(page);
+        break;
+    case PGT_l4_page_table:
+        ASSERT(preemptible);
+        rc = free_l4_table(page);
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "type %" PRtype_info " mfn %" PRI_mfn "\n",
+                 type, page_to_mfn(page));
+        rc = -EINVAL;
+        BUG();
+    }
+
+    return rc;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 6857651db1..7480341240 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -302,9 +302,6 @@ static inline void *__page_to_virt(const struct page_info *pg)
                     (PAGE_SIZE / (sizeof(*pg) & -sizeof(*pg))));
 }
 
-int pv_free_page_type(struct page_info *page, unsigned long type,
-                      bool preemptible);
-
 bool_t fill_ro_mpt(unsigned long mfn);
 void zap_ro_mpt(unsigned long mfn);
 
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 0192580b41..841666e7a0 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -96,6 +96,11 @@ void pv_arch_init_memory(void);
 
 int pv_new_guest_cr3(unsigned long pfn);
 
+int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                       bool preemptible);
+int pv_free_page_type(struct page_info *page, unsigned long type,
+                      bool preemptible);
+
 #else
 
 #include <xen/errno.h>
@@ -126,6 +131,13 @@ static inline void pv_arch_init_memory(void) {}
 
 static inline int pv_new_guest_cr3(unsigned long pfn) { return -EINVAL; }
 
+static inline int pv_alloc_page_type(struct page_info *page, unsigned long type,
+                                     bool preemptible)
+{ return -EINVAL; }
+static inline int pv_free_page_type(struct page_info *page, unsigned long type,
+                      bool preemptible)
+{ return -EINVAL; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 06/11] x86/mm: move and add pv_ prefix to invalidate_shadow_ldt
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (4 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 05/11] x86/mm: move pv_{alloc, free}_page_type Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 07/11] x86/mm: move PV hypercalls to pv/mm-hypercalls.c Wei Liu
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Move the code to pv/mm.c and export it via pv/mm.h. Use bool for flush.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 44 ++++----------------------------------------
 xen/arch/x86/pv/mm.c        | 35 +++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/pv/mm.h |  4 ++++
 3 files changed, 43 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ac0e0ba346..76ce5aef68 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -478,42 +478,6 @@ static inline void page_set_tlbflush_timestamp(struct page_info *page)
 const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
     zero_page[PAGE_SIZE];
 
-static void invalidate_shadow_ldt(struct vcpu *v, int flush)
-{
-    l1_pgentry_t *pl1e;
-    unsigned int i;
-    struct page_info *page;
-
-    BUG_ON(unlikely(in_irq()));
-
-    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
-    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
-        goto out;
-
-    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
-    pl1e = gdt_ldt_ptes(v->domain, v);
-
-    for ( i = 16; i < 32; i++ )
-    {
-        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
-            continue;
-        page = l1e_get_page(pl1e[i]);
-        l1e_write(&pl1e[i], l1e_empty());
-        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
-        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
-        put_page_and_type(page);
-    }
-
-    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
-    if ( flush )
-        flush_tlb_mask(v->vcpu_dirty_cpumask);
-
- out:
-    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-}
-
-
 bool get_page_from_mfn(mfn_t mfn, struct domain *d)
 {
     struct page_info *page = mfn_to_page(mfn_x(mfn));
@@ -1059,7 +1023,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             for_each_vcpu ( pg_owner, v )
-                invalidate_shadow_ldt(v, 1);
+                pv_invalidate_shadow_ldt(v, true);
         }
         put_page(page);
     }
@@ -1958,7 +1922,7 @@ int pv_new_guest_cr3(unsigned long mfn)
             return rc;
         }
 
-        invalidate_shadow_ldt(curr, 0);
+        pv_invalidate_shadow_ldt(curr, false);
         write_ptbase(curr);
 
         return 0;
@@ -1996,7 +1960,7 @@ int pv_new_guest_cr3(unsigned long mfn)
         return rc;
     }
 
-    invalidate_shadow_ldt(curr, 0);
+    pv_invalidate_shadow_ldt(curr, false);
 
     if ( !VM_ASSIST(currd, m2p_strict) && !paging_mode_refcounts(currd) )
         fill_ro_mpt(mfn);
@@ -2496,7 +2460,7 @@ long do_mmuext_op(
             else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
                       (curr->arch.pv_vcpu.ldt_base != ptr) )
             {
-                invalidate_shadow_ldt(curr, 0);
+                pv_invalidate_shadow_ldt(curr, false);
                 flush_tlb_local();
                 curr->arch.pv_vcpu.ldt_base = ptr;
                 curr->arch.pv_vcpu.ldt_ents = ents;
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index f0393b9e3c..19b2ae588e 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -742,6 +742,41 @@ int pv_free_page_type(struct page_info *page, unsigned long type,
     return rc;
 }
 
+void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush)
+{
+    l1_pgentry_t *pl1e;
+    unsigned int i;
+    struct page_info *page;
+
+    BUG_ON(unlikely(in_irq()));
+
+    spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+    if ( v->arch.pv_vcpu.shadow_ldt_mapcnt == 0 )
+        goto out;
+
+    v->arch.pv_vcpu.shadow_ldt_mapcnt = 0;
+    pl1e = gdt_ldt_ptes(v->domain, v);
+
+    for ( i = 16; i < 32; i++ )
+    {
+        if ( !(l1e_get_flags(pl1e[i]) & _PAGE_PRESENT) )
+            continue;
+        page = l1e_get_page(pl1e[i]);
+        l1e_write(&pl1e[i], l1e_empty());
+        ASSERT_PAGE_IS_TYPE(page, PGT_seg_desc_page);
+        ASSERT_PAGE_IS_DOMAIN(page, v->domain);
+        put_page_and_type(page);
+    }
+
+    /* Rid TLBs of stale mappings (guest mappings and shadow mappings). */
+    if ( flush )
+        flush_tlb_mask(v->vcpu_dirty_cpumask);
+
+ out:
+    spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 841666e7a0..664d7c3868 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -101,6 +101,8 @@ int pv_alloc_page_type(struct page_info *page, unsigned long type,
 int pv_free_page_type(struct page_info *page, unsigned long type,
                       bool preemptible);
 
+void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush);
+
 #else
 
 #include <xen/errno.h>
@@ -138,6 +140,8 @@ static inline int pv_free_page_type(struct page_info *page, unsigned long type,
                       bool preemptible)
 { return -EINVAL; }
 
+static inline void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush) {}
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 07/11] x86/mm: move PV hypercalls to pv/mm-hypercalls.c
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (5 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 06/11] x86/mm: move and add pv_ prefix to invalidate_shadow_ldt Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 08/11] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Also move pv_new_guest_cr3 there so that we don't have to export
mod_l1_entry.

Fix coding style issues. Change v to curr and d to currd where
appropriate.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
I can't convince git diff to produce sensible diff for donate_page and
steal_page.  Those functions aren't changed.
---
 xen/arch/x86/mm.c               | 1565 ++-------------------------------------
 xen/arch/x86/pv/Makefile        |    1 +
 xen/arch/x86/pv/mm-hypercalls.c | 1461 ++++++++++++++++++++++++++++++++++++
 3 files changed, 1531 insertions(+), 1496 deletions(-)
 create mode 100644 xen/arch/x86/pv/mm-hypercalls.c

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 76ce5aef68..d232076459 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1168,290 +1168,6 @@ void page_unlock(struct page_info *page)
     } while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
 }
 
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST                                     \
-    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
-     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
-                        unsigned long gl1mfn, int preserve_ad,
-                        struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
-    l1_pgentry_t ol1e;
-    struct domain *pt_dom = pt_vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
-        return -EFAULT;
-
-    ASSERT(!paging_mode_refcounts(pt_dom));
-
-    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
-    {
-        /* Translate foreign guest addresses. */
-        struct page_info *page = NULL;
-
-        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
-                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
-            return -EINVAL;
-        }
-
-        if ( paging_mode_translate(pg_dom) )
-        {
-            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
-            if ( !page )
-                return -EINVAL;
-            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l1e(nl1e, pt_dom);
-            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                              preserve_ad);
-            if ( page )
-                put_page(page);
-            return rc ? 0 : -EBUSY;
-        }
-
-        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
-        {
-        default:
-            if ( page )
-                put_page(page);
-            return rc;
-        case 0:
-            break;
-        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
-            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
-            l1e_flip_flags(nl1e, rc);
-            rc = 0;
-            break;
-        }
-        if ( page )
-            put_page(page);
-
-        adjust_guest_l1e(nl1e, pt_dom);
-        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                    preserve_ad)) )
-        {
-            ol1e = nl1e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l1e(ol1e, pt_dom);
-    return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e,
-                        l2_pgentry_t nl2e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l2_pgentry_t ol2e;
-    struct domain *d = vcpu->domain;
-    struct page_info *l2pg = mfn_to_page(pfn);
-    unsigned long type = l2pg->u.inuse.type_info;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl2e));
-        return -EPERM;
-    }
-
-    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
-        return -EFAULT;
-
-    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l2e(nl2e, d);
-            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
-                return 0;
-            return -EBUSY;
-        }
-
-        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
-            return rc;
-
-        adjust_guest_l2e(nl2e, d);
-        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol2e = nl2e;
-            rc = -EBUSY;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EBUSY;
-    }
-
-    put_page_from_l2e(ol2e, pfn);
-    return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e,
-                        l3_pgentry_t nl3e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    l3_pgentry_t ol3e;
-    struct domain *d = vcpu->domain;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl3e));
-        return -EINVAL;
-    }
-
-    /*
-     * Disallow updates to final L3 slot. It contains Xen mappings, and it
-     * would be a pain to ensure they remain continuously valid throughout.
-     */
-    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
-        return -EINVAL;
-
-    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
-        return -EFAULT;
-
-    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l3e(nl3e, d);
-            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l3e(nl3e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l3e(nl3e, d);
-        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol3e = nl3e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    if ( likely(rc == 0) )
-        if ( !pv_create_pae_xen_mappings(d, pl3e) )
-            BUG();
-
-    put_page_from_l3e(ol3e, pfn, 0, 1);
-    return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e,
-                        l4_pgentry_t nl4e,
-                        unsigned long pfn,
-                        int preserve_ad,
-                        struct vcpu *vcpu)
-{
-    struct domain *d = vcpu->domain;
-    l4_pgentry_t ol4e;
-    int rc = 0;
-
-    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
-    {
-        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
-                 pgentry_ptr_to_slot(pl4e));
-        return -EINVAL;
-    }
-
-    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
-        return -EFAULT;
-
-    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
-    {
-        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
-        {
-            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
-            return -EINVAL;
-        }
-
-        /* Fast path for sufficiently-similar mappings. */
-        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
-        {
-            adjust_guest_l4e(nl4e, d);
-            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
-            return rc ? 0 : -EFAULT;
-        }
-
-        rc = get_page_from_l4e(nl4e, pfn, d, 0);
-        if ( unlikely(rc < 0) )
-            return rc;
-        rc = 0;
-
-        adjust_guest_l4e(nl4e, d);
-        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                    preserve_ad)) )
-        {
-            ol4e = nl4e;
-            rc = -EFAULT;
-        }
-    }
-    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
-                                     preserve_ad)) )
-    {
-        return -EFAULT;
-    }
-
-    put_page_from_l4e(ol4e, pfn, 0, 1);
-    return rc;
-}
-
 static int cleanup_page_cacheattr(struct page_info *page)
 {
     unsigned int cacheattr =
@@ -1890,1123 +1606,96 @@ int vcpu_destroy_pagetables(struct vcpu *v)
     return rc != -EINTR ? rc : -ERESTART;
 }
 
-int pv_new_guest_cr3(unsigned long mfn)
+int donate_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
 {
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    int rc;
-    unsigned long old_base_mfn;
-
-    if ( is_pv_32bit_domain(currd) )
-    {
-        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
-        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
-
-        rc = mod_l4_entry(pl4e,
-                          l4e_from_pfn(mfn,
-                                       (_PAGE_PRESENT | _PAGE_RW |
-                                        _PAGE_USER | _PAGE_ACCESSED)),
-                          gt_mfn, 0, curr);
-        unmap_domain_page(pl4e);
-        switch ( rc )
-        {
-        case 0:
-            break;
-        case -EINTR:
-        case -ERESTART:
-            return -ERESTART;
-        default:
-            gdprintk(XENLOG_WARNING,
-                     "Error while installing new compat baseptr %" PRI_mfn "\n",
-                     mfn);
-            return rc;
-        }
+    const struct domain *owner = dom_xen;
 
-        pv_invalidate_shadow_ldt(curr, false);
-        write_ptbase(curr);
+    spin_lock(&d->page_alloc_lock);
 
-        return 0;
-    }
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
+        goto fail;
 
-    rc = put_old_guest_table(curr);
-    if ( unlikely(rc) )
-        return rc;
+    if ( d->is_dying )
+        goto fail;
 
-    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
-    /*
-     * This is particularly important when getting restarted after the
-     * previous attempt got preempted in the put-old-MFN phase.
-     */
-    if ( old_base_mfn == mfn )
-    {
-        write_ptbase(curr);
-        return 0;
-    }
+    if ( page->count_info & ~(PGC_allocated | 1) )
+        goto fail;
 
-    rc = paging_mode_refcounts(currd)
-         ? (get_page_from_mfn(_mfn(mfn), currd) ? 0 : -EINVAL)
-         : get_page_and_type_from_mfn(_mfn(mfn), PGT_root_page_table,
-                                      currd, 0, true);
-    switch ( rc )
+    if ( !(memflags & MEMF_no_refcount) )
     {
-    case 0:
-        break;
-    case -EINTR:
-    case -ERESTART:
-        return -ERESTART;
-    default:
-        gdprintk(XENLOG_WARNING,
-                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
-        return rc;
+        if ( d->tot_pages >= d->max_pages )
+            goto fail;
+        domain_adjust_tot_pages(d, 1);
     }
 
-    pv_invalidate_shadow_ldt(curr, false);
-
-    if ( !VM_ASSIST(currd, m2p_strict) && !paging_mode_refcounts(currd) )
-        fill_ro_mpt(mfn);
-    curr->arch.guest_table = pagetable_from_pfn(mfn);
-    update_cr3(curr);
-
-    write_ptbase(curr);
-
-    if ( likely(old_base_mfn != 0) )
-    {
-        struct page_info *page = mfn_to_page(old_base_mfn);
+    page->count_info = PGC_allocated | 1;
+    page_set_owner(page, d);
+    page_list_add_tail(page,&d->page_list);
 
-        if ( paging_mode_refcounts(currd) )
-            put_page(page);
-        else
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-                rc = -ERESTART;
-                /* fallthrough */
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-    }
+    spin_unlock(&d->page_alloc_lock);
+    return 0;
 
-    return rc;
+ fail:
+    spin_unlock(&d->page_alloc_lock);
+    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
+             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
+             page_to_mfn(page), d->domain_id,
+             owner ? owner->domain_id : DOMID_INVALID,
+             page->count_info, page->u.inuse.type_info);
+    return -EINVAL;
 }
 
-static struct domain *get_pg_owner(domid_t domid)
+int steal_page(
+    struct domain *d, struct page_info *page, unsigned int memflags)
 {
-    struct domain *pg_owner = NULL, *curr = current->domain;
+    unsigned long x, y;
+    bool drop_dom_ref = false;
+    const struct domain *owner = dom_xen;
 
-    if ( likely(domid == DOMID_SELF) )
-    {
-        pg_owner = rcu_lock_current_domain();
-        goto out;
-    }
+    if ( paging_mode_external(d) )
+        return -EOPNOTSUPP;
 
-    if ( unlikely(domid == curr->domain_id) )
-    {
-        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
-        goto out;
-    }
+    spin_lock(&d->page_alloc_lock);
 
-    switch ( domid )
-    {
-    case DOMID_IO:
-        pg_owner = rcu_lock_domain(dom_io);
-        break;
-    case DOMID_XEN:
-        pg_owner = rcu_lock_domain(dom_xen);
-        break;
-    default:
-        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
-        {
-            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
-            break;
-        }
-        break;
-    }
+    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
+        goto fail;
 
- out:
-    return pg_owner;
-}
+    /*
+     * We require there is just one reference (PGC_allocated). We temporarily
+     * drop this reference now so that we can safely swizzle the owner.
+     */
+    y = page->count_info;
+    do {
+        x = y;
+        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
+            goto fail;
+        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
+    } while ( y != x );
 
-static void put_pg_owner(struct domain *pg_owner)
-{
-    rcu_unlock_domain(pg_owner);
-}
+    /*
+     * With the sole reference dropped temporarily, no-one can update type
+     * information. Type count also needs to be zero in this case, but e.g.
+     * PGT_seg_desc_page may still have PGT_validated set, which we need to
+     * clear before transferring ownership (as validation criteria vary
+     * depending on domain type).
+     */
+    BUG_ON(page->u.inuse.type_info & (PGT_count_mask | PGT_locked |
+                                      PGT_pinned));
+    page->u.inuse.type_info = 0;
 
-static inline int vcpumask_to_pcpumask(
-    struct domain *d, XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
-{
-    unsigned int vcpu_id, vcpu_bias, offs;
-    unsigned long vmask;
-    struct vcpu *v;
-    bool is_native = !is_pv_32bit_domain(d);
+    /* Swizzle the owner then reinstate the PGC_allocated reference. */
+    page_set_owner(page, NULL);
+    y = page->count_info;
+    do {
+        x = y;
+        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
+    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
 
-    cpumask_clear(pmask);
-    for ( vmask = 0, offs = 0; ; ++offs )
-    {
-        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
-        if ( vcpu_bias >= d->max_vcpus )
-            return 0;
-
-        if ( unlikely(is_native ?
-                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
-                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
-                                             offs, 1)) )
-        {
-            cpumask_clear(pmask);
-            return -EFAULT;
-        }
-
-        while ( vmask )
-        {
-            vcpu_id = find_first_set_bit(vmask);
-            vmask &= ~(1UL << vcpu_id);
-            vcpu_id += vcpu_bias;
-            if ( (vcpu_id >= d->max_vcpus) )
-                return 0;
-            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
-                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
-        }
-    }
-}
-
-long do_mmuext_op(
-    XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmuext_op op;
-    unsigned long type;
-    unsigned int i, done = 0;
-    struct vcpu *curr = current;
-    struct domain *currd = curr->domain;
-    struct domain *pg_owner;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(uops)) )
-    {
-        /*
-         * See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below.
-         */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmuext_op);
-
-    if ( unlikely(!guest_handle_okay(uops, count)) )
-        return -EFAULT;
-
-    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
-        return -ESRCH;
-
-    if ( !is_pv_domain(pg_owner) )
-    {
-        put_pg_owner(pg_owner);
-        return -EINVAL;
-    }
-
-    rc = xsm_mmuext_op(XSM_TARGET, currd, pg_owner);
-    if ( rc )
-    {
-        put_pg_owner(pg_owner);
-        return rc;
-    }
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        if ( is_hvm_domain(currd) )
-        {
-            switch ( op.cmd )
-            {
-            case MMUEXT_PIN_L1_TABLE:
-            case MMUEXT_PIN_L2_TABLE:
-            case MMUEXT_PIN_L3_TABLE:
-            case MMUEXT_PIN_L4_TABLE:
-            case MMUEXT_UNPIN_TABLE:
-                break;
-            default:
-                rc = -EOPNOTSUPP;
-                goto done;
-            }
-        }
-
-        rc = 0;
-
-        switch ( op.cmd )
-        {
-            struct page_info *page;
-            p2m_type_t p2mt;
-
-        case MMUEXT_PIN_L1_TABLE:
-            type = PGT_l1_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L2_TABLE:
-            type = PGT_l2_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L3_TABLE:
-            type = PGT_l3_page_table;
-            goto pin_page;
-
-        case MMUEXT_PIN_L4_TABLE:
-            if ( is_pv_32bit_domain(pg_owner) )
-                break;
-            type = PGT_l4_page_table;
-
-        pin_page:
-            /* Ignore pinning of invalid paging levels. */
-            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
-                break;
-
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            rc = get_page_type_preemptible(page, type);
-            if ( unlikely(rc) )
-            {
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-                else if ( rc != -ERESTART )
-                    gdprintk(XENLOG_WARNING,
-                             "Error %d while pinning mfn %" PRI_mfn "\n",
-                            rc, page_to_mfn(page));
-                if ( page != curr->arch.old_guest_table )
-                    put_page(page);
-                break;
-            }
-
-            rc = xsm_memory_pin_page(XSM_HOOK, currd, pg_owner, page);
-            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
-                                                  &page->u.inuse.type_info)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page));
-                rc = -EINVAL;
-            }
-
-            if ( unlikely(rc) )
-                goto pin_drop;
-
-            /* A page is dirtied when its pin status is set. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            /* We can race domain destruction (domain_relinquish_resources). */
-            if ( unlikely(pg_owner != currd) )
-            {
-                bool drop_ref;
-
-                spin_lock(&pg_owner->page_alloc_lock);
-                drop_ref = (pg_owner->is_dying &&
-                            test_and_clear_bit(_PGT_pinned,
-                                               &page->u.inuse.type_info));
-                spin_unlock(&pg_owner->page_alloc_lock);
-                if ( drop_ref )
-                {
-        pin_drop:
-                    if ( type == PGT_l1_page_table )
-                        put_page_and_type(page);
-                    else
-                        curr->arch.old_guest_table = page;
-                }
-            }
-            break;
-
-        case MMUEXT_UNPIN_TABLE:
-            if ( paging_mode_refcounts(pg_owner) )
-                break;
-
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
-                         op.arg1.mfn, pg_owner->domain_id);
-                rc = -EINVAL;
-                break;
-            }
-
-            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
-            {
-                put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            switch ( rc = put_page_and_type_preemptible(page) )
-            {
-            case -EINTR:
-            case -ERESTART:
-                curr->arch.old_guest_table = page;
-                rc = 0;
-                break;
-            default:
-                BUG_ON(rc);
-                break;
-            }
-            put_page(page);
-
-            /* A page is dirtied when its pin status is cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-            break;
-
-        case MMUEXT_NEW_BASEPTR:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(currd)) )
-                rc = -EINVAL;
-            else
-                rc = pv_new_guest_cr3(op.arg1.mfn);
-            break;
-
-        case MMUEXT_NEW_USER_BASEPTR: {
-            unsigned long old_mfn;
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(paging_mode_translate(currd)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
-            /*
-             * This is particularly important when getting restarted after the
-             * previous attempt got preempted in the put-old-MFN phase.
-             */
-            if ( old_mfn == op.arg1.mfn )
-                break;
-
-            if ( op.arg1.mfn != 0 )
-            {
-                rc = get_page_and_type_from_mfn(
-                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, true);
-
-                if ( unlikely(rc) )
-                {
-                    if ( rc == -EINTR )
-                        rc = -ERESTART;
-                    else if ( rc != -ERESTART )
-                        gdprintk(XENLOG_WARNING,
-                                 "Error %d installing new mfn %" PRI_mfn "\n",
-                                 rc, op.arg1.mfn);
-                    break;
-                }
-
-                if ( VM_ASSIST(currd, m2p_strict) )
-                    zap_ro_mpt(op.arg1.mfn);
-            }
-
-            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
-
-            if ( old_mfn != 0 )
-            {
-                page = mfn_to_page(old_mfn);
-
-                switch ( rc = put_page_and_type_preemptible(page) )
-                {
-                case -EINTR:
-                    rc = -ERESTART;
-                    /* fallthrough */
-                case -ERESTART:
-                    curr->arch.old_guest_table = page;
-                    break;
-                default:
-                    BUG_ON(rc);
-                    break;
-                }
-            }
-
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_LOCAL:
-            if ( likely(currd == pg_owner) )
-                flush_tlb_local();
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_LOCAL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else
-                paging_invlpg(curr, op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_TLB_FLUSH_MULTI:
-        case MMUEXT_INVLPG_MULTI:
-        {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(vcpumask_to_pcpumask(currd,
-                                   guest_handle_to_param(op.arg2.vcpumask,
-                                                         const_void),
-                                   mask)) )
-                rc = -EINVAL;
-            if ( unlikely(rc) )
-                break;
-
-            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
-                flush_tlb_mask(mask);
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(mask, op.arg1.linear_addr);
-            break;
-        }
-
-        case MMUEXT_TLB_FLUSH_ALL:
-            if ( likely(currd == pg_owner) )
-                flush_tlb_mask(currd->domain_dirty_cpumask);
-            else
-                rc = -EPERM;
-            break;
-
-        case MMUEXT_INVLPG_ALL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( __addr_ok(op.arg1.linear_addr) )
-                flush_tlb_one_mask(currd->domain_dirty_cpumask,
-                                   op.arg1.linear_addr);
-            break;
-
-        case MMUEXT_FLUSH_CACHE:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( unlikely(!cache_flush_permitted(currd)) )
-                rc = -EACCES;
-            else
-                wbinvd();
-            break;
-
-        case MMUEXT_FLUSH_CACHE_GLOBAL:
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( likely(cache_flush_permitted(currd)) )
-            {
-                unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
-
-                cpumask_clear(mask);
-                for_each_online_cpu(cpu)
-                    if ( !cpumask_intersects(mask,
-                                             per_cpu(cpu_sibling_mask, cpu)) )
-                        __cpumask_set_cpu(cpu, mask);
-                flush_mask(mask, FLUSH_CACHE);
-            }
-            else
-                rc = -EINVAL;
-            break;
-
-        case MMUEXT_SET_LDT:
-        {
-            unsigned int ents = op.arg2.nr_ents;
-            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
-
-            if ( unlikely(currd != pg_owner) )
-                rc = -EPERM;
-            else if ( paging_mode_external(currd) )
-                rc = -EINVAL;
-            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
-                      (ents > 8192) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
-                rc = -EINVAL;
-            }
-            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
-                      (curr->arch.pv_vcpu.ldt_base != ptr) )
-            {
-                pv_invalidate_shadow_ldt(curr, false);
-                flush_tlb_local();
-                curr->arch.pv_vcpu.ldt_base = ptr;
-                curr->arch.pv_vcpu.ldt_ents = ents;
-                load_LDT(curr);
-            }
-            break;
-        }
-
-        case MMUEXT_CLEAR_PAGE:
-            page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt, P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && page )
-            {
-                put_page(page);
-                page = NULL;
-            }
-            if ( !page || !get_page_type(page, PGT_writable_page) )
-            {
-                if ( page )
-                    put_page(page);
-                gdprintk(XENLOG_WARNING,
-                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            /* A page is dirtied when it's being cleared. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
-
-            clear_domain_page(_mfn(page_to_mfn(page)));
-
-            put_page_and_type(page);
-            break;
-
-        case MMUEXT_COPY_PAGE:
-        {
-            struct page_info *src_page, *dst_page;
-
-            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, &p2mt,
-                                         P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && src_page )
-            {
-                put_page(src_page);
-                src_page = NULL;
-            }
-            if ( unlikely(!src_page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Error copying from mfn %" PRI_mfn "\n",
-                         op.arg2.src_mfn);
-                rc = -EINVAL;
-                break;
-            }
-
-            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt,
-                                         P2M_ALLOC);
-            if ( unlikely(p2mt != p2m_ram_rw) && dst_page )
-            {
-                put_page(dst_page);
-                dst_page = NULL;
-            }
-            rc = (dst_page &&
-                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
-            if ( unlikely(rc) )
-            {
-                put_page(src_page);
-                if ( dst_page )
-                    put_page(dst_page);
-                gdprintk(XENLOG_WARNING,
-                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
-                break;
-            }
-
-            /* A page is dirtied when it's being copied to. */
-            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
-
-            copy_domain_page(_mfn(page_to_mfn(dst_page)),
-                             _mfn(page_to_mfn(src_page)));
-
-            put_page_and_type(dst_page);
-            put_page(src_page);
-            break;
-        }
-
-        case MMUEXT_MARK_SUPER:
-        case MMUEXT_UNMARK_SUPER:
-            rc = -EOPNOTSUPP;
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
- done:
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(uops, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmuext_op, "hihi",
-            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmuext_op, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    perfc_add(num_mmuext_ops, i);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-long do_mmu_update(
-    XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
-    unsigned int count,
-    XEN_GUEST_HANDLE_PARAM(uint) pdone,
-    unsigned int foreigndom)
-{
-    struct mmu_update req;
-    void *va;
-    unsigned long gpfn, gmfn, mfn;
-    struct page_info *page;
-    unsigned int cmd, i = 0, done = 0, pt_dom;
-    struct vcpu *curr = current, *v = curr;
-    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
-    struct domain_mmap_cache mapcache;
-    uint32_t xsm_needed = 0;
-    uint32_t xsm_checked = 0;
-    int rc = put_old_guest_table(curr);
-
-    if ( unlikely(rc) )
-    {
-        if ( likely(rc == -ERESTART) )
-            rc = hypercall_create_continuation(
-                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
-                     foreigndom);
-        return rc;
-    }
-
-    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
-         likely(guest_handle_is_null(ureqs)) )
-    {
-        /*
-         * See the curr->arch.old_guest_table related
-         * hypercall_create_continuation() below.
-         */
-        return (int)foreigndom;
-    }
-
-    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
-    {
-        count &= ~MMU_UPDATE_PREEMPTED;
-        if ( unlikely(!guest_handle_is_null(pdone)) )
-            (void)copy_from_guest(&done, pdone, 1);
-    }
-    else
-        perfc_incr(calls_to_mmu_update);
-
-    if ( unlikely(!guest_handle_okay(ureqs, count)) )
-        return -EFAULT;
-
-    if ( (pt_dom = foreigndom >> 16) != 0 )
-    {
-        /* Pagetables belong to a foreign domain (PFD). */
-        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
-            return -ESRCH;
-
-        if ( pt_owner == d )
-            rcu_unlock_domain(pt_owner);
-        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
-        {
-            rc = -EINVAL;
-            goto out;
-        }
-    }
-
-    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
-    {
-        rc = -ESRCH;
-        goto out;
-    }
-
-    domain_mmap_cache_init(&mapcache);
-
-    for ( i = 0; i < count; i++ )
-    {
-        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
-        {
-            rc = -ERESTART;
-            break;
-        }
-
-        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
-        {
-            rc = -EFAULT;
-            break;
-        }
-
-        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
-        switch ( cmd )
-        {
-            /*
-             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
-             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
-             * current A/D bits.
-             */
-        case MMU_NORMAL_PT_UPDATE:
-        case MMU_PT_UPDATE_PRESERVE_AD:
-        {
-            p2m_type_t p2mt;
-
-            rc = -EOPNOTSUPP;
-            if ( unlikely(paging_mode_refcounts(pt_owner)) )
-                break;
-
-            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
-            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
-            {
-                xsm_needed |= XSM_MMU_UPDATE_READ;
-                if ( get_pte_flags(req.val) & _PAGE_RW )
-                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
-            }
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-            rc = -EINVAL;
-
-            req.ptr -= cmd;
-            gmfn = req.ptr >> PAGE_SHIFT;
-            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
-            if ( p2m_is_paged(p2mt) )
-            {
-                ASSERT(!page);
-                p2m_mem_paging_populate(pg_owner, gmfn);
-                rc = -ENOENT;
-                break;
-            }
-
-            if ( unlikely(!page) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for normal update\n");
-                break;
-            }
-
-            mfn = page_to_mfn(page);
-            va = map_domain_page_with_cache(mfn, &mapcache);
-            va = (void *)((unsigned long)va +
-                          (unsigned long)(req.ptr & ~PAGE_MASK));
-
-            if ( page_lock(page) )
-            {
-                switch ( page->u.inuse.type_info & PGT_type_mask )
-                {
-                case PGT_l1_page_table:
-                {
-                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
-                    p2m_type_t l1e_p2mt = p2m_ram_rw;
-                    struct page_info *target = NULL;
-                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
-                                        P2M_UNSHARE : P2M_ALLOC;
-
-                    if ( paging_mode_translate(pg_owner) )
-                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
-                                                   &l1e_p2mt, q);
-
-                    if ( p2m_is_paged(l1e_p2mt) )
-                    {
-                        if ( target )
-                            put_page(target);
-                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
-                        rc = -ENOENT;
-                        break;
-                    }
-                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
-                    {
-                        rc = -ENOENT;
-                        break;
-                    }
-                    /* If we tried to unshare and failed */
-                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
-                    {
-                        /* We could not have obtained a page ref. */
-                        ASSERT(target == NULL);
-                        /* And mem_sharing_notify has already been called. */
-                        rc = -ENOMEM;
-                        break;
-                    }
-
-                    rc = mod_l1_entry(va, l1e, mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
-                                      pg_owner);
-                    if ( target )
-                        put_page(target);
-                }
-                break;
-                case PGT_l2_page_table:
-                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l3_page_table:
-                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                    break;
-                case PGT_l4_page_table:
-                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
-                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
-                break;
-                case PGT_writable_page:
-                    perfc_incr(writable_mmu_updates);
-                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                        rc = 0;
-                    break;
-                }
-                page_unlock(page);
-                if ( rc == -EINTR )
-                    rc = -ERESTART;
-            }
-            else if ( get_page_type(page, PGT_writable_page) )
-            {
-                perfc_incr(writable_mmu_updates);
-                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
-                    rc = 0;
-                put_page_type(page);
-            }
-
-            unmap_domain_page_with_cache(va, &mapcache);
-            put_page(page);
-        }
-        break;
-
-        case MMU_MACHPHYS_UPDATE:
-            if ( unlikely(d != pt_owner) )
-            {
-                rc = -EPERM;
-                break;
-            }
-
-            if ( unlikely(paging_mode_translate(pg_owner)) )
-            {
-                rc = -EINVAL;
-                break;
-            }
-
-            mfn = req.ptr >> PAGE_SHIFT;
-            gpfn = req.val;
-
-            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
-            if ( xsm_needed != xsm_checked )
-            {
-                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
-                if ( rc )
-                    break;
-                xsm_checked = xsm_needed;
-            }
-
-            if ( unlikely(!get_page_from_mfn(_mfn(mfn), pg_owner)) )
-            {
-                gdprintk(XENLOG_WARNING,
-                         "Could not get page for mach->phys update\n");
-                rc = -EINVAL;
-                break;
-            }
-
-            set_gpfn_from_mfn(mfn, gpfn);
-
-            paging_mark_dirty(pg_owner, _mfn(mfn));
-
-            put_page(mfn_to_page(mfn));
-            break;
-
-        default:
-            rc = -ENOSYS;
-            break;
-        }
-
-        if ( unlikely(rc) )
-            break;
-
-        guest_handle_add_offset(ureqs, 1);
-    }
-
-    if ( rc == -ERESTART )
-    {
-        ASSERT(i < count);
-        rc = hypercall_create_continuation(
-            __HYPERVISOR_mmu_update, "hihi",
-            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
-    }
-    else if ( curr->arch.old_guest_table )
-    {
-        XEN_GUEST_HANDLE_PARAM(void) null;
-
-        ASSERT(rc || i == count);
-        set_xen_guest_handle(null, NULL);
-        /*
-         * In order to have a way to communicate the final return value to
-         * our continuation, we pass this in place of "foreigndom", building
-         * on the fact that this argument isn't needed anymore.
-         */
-        rc = hypercall_create_continuation(
-                __HYPERVISOR_mmu_update, "hihi", null,
-                MMU_UPDATE_PREEMPTED, null, rc);
-    }
-
-    put_pg_owner(pg_owner);
-
-    domain_mmap_cache_destroy(&mapcache);
-
-    perfc_add(num_page_updates, i);
-
- out:
-    if ( pt_owner != d )
-        rcu_unlock_domain(pt_owner);
-
-    /* Add incremental work we have done to the @done output parameter. */
-    if ( unlikely(!guest_handle_is_null(pdone)) )
-    {
-        done += i;
-        copy_to_guest(pdone, &done, 1);
-    }
-
-    return rc;
-}
-
-int donate_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    const struct domain *owner = dom_xen;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != NULL) )
-        goto fail;
-
-    if ( d->is_dying )
-        goto fail;
-
-    if ( page->count_info & ~(PGC_allocated | 1) )
-        goto fail;
-
-    if ( !(memflags & MEMF_no_refcount) )
-    {
-        if ( d->tot_pages >= d->max_pages )
-            goto fail;
-        domain_adjust_tot_pages(d, 1);
-    }
-
-    page->count_info = PGC_allocated | 1;
-    page_set_owner(page, d);
-    page_list_add_tail(page,&d->page_list);
-
-    spin_unlock(&d->page_alloc_lock);
-    return 0;
-
- fail:
-    spin_unlock(&d->page_alloc_lock);
-    gdprintk(XENLOG_WARNING, "Bad donate mfn %" PRI_mfn
-             " to d%d (owner d%d) caf=%08lx taf=%" PRtype_info "\n",
-             page_to_mfn(page), d->domain_id,
-             owner ? owner->domain_id : DOMID_INVALID,
-             page->count_info, page->u.inuse.type_info);
-    return -EINVAL;
-}
-
-int steal_page(
-    struct domain *d, struct page_info *page, unsigned int memflags)
-{
-    unsigned long x, y;
-    bool drop_dom_ref = false;
-    const struct domain *owner = dom_xen;
-
-    if ( paging_mode_external(d) )
-        return -EOPNOTSUPP;
-
-    spin_lock(&d->page_alloc_lock);
-
-    if ( is_xen_heap_page(page) || ((owner = page_get_owner(page)) != d) )
-        goto fail;
-
-    /*
-     * We require there is just one reference (PGC_allocated). We temporarily
-     * drop this reference now so that we can safely swizzle the owner.
-     */
-    y = page->count_info;
-    do {
-        x = y;
-        if ( (x & (PGC_count_mask|PGC_allocated)) != (1 | PGC_allocated) )
-            goto fail;
-        y = cmpxchg(&page->count_info, x, x & ~PGC_count_mask);
-    } while ( y != x );
-
-    /*
-     * With the sole reference dropped temporarily, no-one can update type
-     * information. Type count also needs to be zero in this case, but e.g.
-     * PGT_seg_desc_page may still have PGT_validated set, which we need to
-     * clear before transferring ownership (as validation criteria vary
-     * depending on domain type).
-     */
-    BUG_ON(page->u.inuse.type_info & (PGT_count_mask | PGT_locked |
-                                      PGT_pinned));
-    page->u.inuse.type_info = 0;
-
-    /* Swizzle the owner then reinstate the PGC_allocated reference. */
-    page_set_owner(page, NULL);
-    y = page->count_info;
-    do {
-        x = y;
-        BUG_ON((x & (PGC_count_mask|PGC_allocated)) != PGC_allocated);
-    } while ( (y = cmpxchg(&page->count_info, x, x | 1)) != x );
-
-    /* Unlink from original owner. */
-    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
-        drop_dom_ref = true;
-    page_list_del(page, &d->page_list);
+    /* Unlink from original owner. */
+    if ( !(memflags & MEMF_no_refcount) && !domain_adjust_tot_pages(d, -1) )
+        drop_dom_ref = true;
+    page_list_del(page, &d->page_list);
 
     spin_unlock(&d->page_alloc_lock);
     if ( unlikely(drop_dom_ref) )
@@ -3023,122 +1712,6 @@ int steal_page(
     return -EINVAL;
 }
 
-static int __do_update_va_mapping(
-    unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
-    l1_pgentry_t   val = l1e_from_intpte(val64);
-    struct vcpu   *v   = current;
-    struct domain *d   = v->domain;
-    struct page_info *gl1pg;
-    l1_pgentry_t  *pl1e;
-    unsigned long  bmap_ptr, gl1mfn;
-    cpumask_t     *mask = NULL;
-    int            rc;
-
-    perfc_incr(calls_to_update_va);
-
-    rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
-    if ( rc )
-        return rc;
-
-    rc = -EINVAL;
-    pl1e = pv_map_guest_l1e(va, &gl1mfn);
-    if ( unlikely(!pl1e || !get_page_from_mfn(_mfn(gl1mfn), d)) )
-        goto out;
-
-    gl1pg = mfn_to_page(gl1mfn);
-    if ( !page_lock(gl1pg) )
-    {
-        put_page(gl1pg);
-        goto out;
-    }
-
-    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
-    {
-        page_unlock(gl1pg);
-        put_page(gl1pg);
-        goto out;
-    }
-
-    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
-
-    page_unlock(gl1pg);
-    put_page(gl1pg);
-
- out:
-    if ( pl1e )
-        pv_unmap_guest_l1e(pl1e);
-
-    switch ( flags & UVMF_FLUSHTYPE_MASK )
-    {
-    case UVMF_TLB_FLUSH:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            flush_tlb_local();
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_mask(mask);
-        break;
-
-    case UVMF_INVLPG:
-        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
-        {
-        case UVMF_LOCAL:
-            paging_invlpg(v, va);
-            break;
-        case UVMF_ALL:
-            mask = d->domain_dirty_cpumask;
-            break;
-        default:
-            mask = this_cpu(scratch_cpumask);
-            rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
-                                                                     void),
-                                      mask);
-            break;
-        }
-        if ( mask )
-            flush_tlb_one_mask(mask, va);
-        break;
-    }
-
-    return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
-                          unsigned long flags)
-{
-    return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
-                                      unsigned long flags,
-                                      domid_t domid)
-{
-    struct domain *pg_owner;
-    int rc;
-
-    if ( (pg_owner = get_pg_owner(domid)) == NULL )
-        return -ESRCH;
-
-    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
-    put_pg_owner(pg_owner);
-
-    return rc;
-}
-
-
 typedef struct e820entry e820entry_t;
 DEFINE_XEN_GUEST_HANDLE(e820entry_t);
 
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index 42e9d3723b..219d7d0c63 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -12,6 +12,7 @@ obj-y += hypercall.o
 obj-y += iret.o
 obj-y += misc-hypercalls.o
 obj-y += mm.o
+obj-y += mm-hypercalls.o
 obj-y += traps.o
 
 obj-bin-y += dom0_build.init.o
diff --git a/xen/arch/x86/pv/mm-hypercalls.c b/xen/arch/x86/pv/mm-hypercalls.c
new file mode 100644
index 0000000000..8e48df89b5
--- /dev/null
+++ b/xen/arch/x86/pv/mm-hypercalls.c
@@ -0,0 +1,1461 @@
+/******************************************************************************
+ * arch/x86/pv/mm-hypercalls.c
+ *
+ * Memory management hypercalls for PV guests
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+
+#include <asm/hypercall.h>
+#include <asm/iocap.h>
+#include <asm/ldt.h>
+#include <asm/mm.h>
+#include <asm/p2m.h>
+#include <asm/pv/mm.h>
+#include <asm/setup.h>
+
+#include <xsm/xsm.h>
+
+#include "mm.h"
+
+static struct domain *get_pg_owner(domid_t domid)
+{
+    struct domain *pg_owner = NULL, *currd = current->domain;
+
+    if ( likely(domid == DOMID_SELF) )
+    {
+        pg_owner = rcu_lock_current_domain();
+        goto out;
+    }
+
+    if ( unlikely(domid == currd->domain_id) )
+    {
+        gdprintk(XENLOG_WARNING, "Cannot specify itself as foreign domain\n");
+        goto out;
+    }
+
+    switch ( domid )
+    {
+    case DOMID_IO:
+        pg_owner = rcu_lock_domain(dom_io);
+        break;
+    case DOMID_XEN:
+        pg_owner = rcu_lock_domain(dom_xen);
+        break;
+    default:
+        if ( (pg_owner = rcu_lock_domain_by_id(domid)) == NULL )
+        {
+            gdprintk(XENLOG_WARNING, "Unknown domain d%d\n", domid);
+            break;
+        }
+        break;
+    }
+
+ out:
+    return pg_owner;
+}
+
+static void put_pg_owner(struct domain *pg_owner)
+{
+    rcu_unlock_domain(pg_owner);
+}
+
+static inline int vcpumask_to_pcpumask(struct domain *d,
+     XEN_GUEST_HANDLE_PARAM(const_void) bmap, cpumask_t *pmask)
+{
+    unsigned int vcpu_id, vcpu_bias, offs;
+    unsigned long vmask;
+    struct vcpu *v;
+    bool is_native = !is_pv_32bit_domain(d);
+
+    cpumask_clear(pmask);
+    for ( vmask = 0, offs = 0; ; ++offs )
+    {
+        vcpu_bias = offs * (is_native ? BITS_PER_LONG : 32);
+        if ( vcpu_bias >= d->max_vcpus )
+            return 0;
+
+        if ( unlikely(is_native ?
+                      copy_from_guest_offset(&vmask, bmap, offs, 1) :
+                      copy_from_guest_offset((unsigned int *)&vmask, bmap,
+                                             offs, 1)) )
+        {
+            cpumask_clear(pmask);
+            return -EFAULT;
+        }
+
+        while ( vmask )
+        {
+            vcpu_id = find_first_set_bit(vmask);
+            vmask &= ~(1UL << vcpu_id);
+            vcpu_id += vcpu_bias;
+            if ( (vcpu_id >= d->max_vcpus) )
+                return 0;
+            if ( ((v = d->vcpu[vcpu_id]) != NULL) )
+                cpumask_or(pmask, pmask, v->vcpu_dirty_cpumask);
+        }
+    }
+}
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST                                     \
+    (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+     _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+                        unsigned long gl1mfn, int preserve_ad,
+                        struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+    l1_pgentry_t ol1e;
+    struct domain *pt_dom = pt_vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+        return -EFAULT;
+
+    ASSERT(!paging_mode_refcounts(pt_dom));
+
+    if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+    {
+        /* Translate foreign guest addresses. */
+        struct page_info *page = NULL;
+
+        if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L1 flags %x\n",
+                    l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+            return -EINVAL;
+        }
+
+        if ( paging_mode_translate(pg_dom) )
+        {
+            page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
+            if ( !page )
+                return -EINVAL;
+            nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l1e(nl1e, pt_dom);
+            rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                              preserve_ad);
+            if ( page )
+                put_page(page);
+            return rc ? 0 : -EBUSY;
+        }
+
+        switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        default:
+            if ( page )
+                put_page(page);
+            return rc;
+        case 0:
+            break;
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
+            rc = 0;
+            break;
+        }
+        if ( page )
+            put_page(page);
+
+        adjust_guest_l1e(nl1e, pt_dom);
+        if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                    preserve_ad)) )
+        {
+            ol1e = nl1e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l1e(ol1e, pt_dom);
+    return rc;
+}
+
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e, l2_pgentry_t nl2e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    l2_pgentry_t ol2e;
+    struct domain *d = vcpu->domain;
+    struct page_info *l2pg = mfn_to_page(pfn);
+    unsigned long type = l2pg->u.inuse.type_info;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L2 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl2e));
+        return -EPERM;
+    }
+
+    if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+        return -EFAULT;
+
+    if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                    l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l2e(nl2e, d);
+            if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+                return 0;
+            return -EBUSY;
+        }
+
+        if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+            return rc;
+
+        adjust_guest_l2e(nl2e, d);
+        if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol2e = nl2e;
+            rc = -EBUSY;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EBUSY;
+    }
+
+    put_page_from_l2e(ol2e, pfn);
+    return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e, l3_pgentry_t nl3e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    l3_pgentry_t ol3e;
+    struct domain *d = vcpu->domain;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L3 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl3e));
+        return -EINVAL;
+    }
+
+    /*
+     * Disallow updates to final L3 slot. It contains Xen mappings, and it
+     * would be a pain to ensure they remain continuously valid throughout.
+     */
+    if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+        return -EINVAL;
+
+    if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+        return -EFAULT;
+
+    if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                    l3e_get_flags(nl3e) & l3_disallow_mask(d));
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l3e(nl3e, d);
+            rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l3e(nl3e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l3e(nl3e, d);
+        if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol3e = nl3e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    if ( likely(rc == 0) )
+        if ( !pv_create_pae_xen_mappings(d, pl3e) )
+            BUG();
+
+    put_page_from_l3e(ol3e, pfn, 0, 1);
+    return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e, l4_pgentry_t nl4e,
+                        unsigned long pfn, int preserve_ad, struct vcpu *vcpu)
+{
+    struct domain *d = vcpu->domain;
+    l4_pgentry_t ol4e;
+    int rc = 0;
+
+    if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+    {
+        gdprintk(XENLOG_WARNING, "L4 update in Xen-private area, slot %#lx\n",
+                 pgentry_ptr_to_slot(pl4e));
+        return -EINVAL;
+    }
+
+    if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+        return -EFAULT;
+
+    if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+    {
+        if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+        {
+            gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                    l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+            return -EINVAL;
+        }
+
+        /* Fast path for sufficiently-similar mappings. */
+        if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+        {
+            adjust_guest_l4e(nl4e, d);
+            rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+            return rc ? 0 : -EFAULT;
+        }
+
+        rc = get_page_from_l4e(nl4e, pfn, d, 0);
+        if ( unlikely(rc < 0) )
+            return rc;
+        rc = 0;
+
+        adjust_guest_l4e(nl4e, d);
+        if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                    preserve_ad)) )
+        {
+            ol4e = nl4e;
+            rc = -EFAULT;
+        }
+    }
+    else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+                                     preserve_ad)) )
+    {
+        return -EFAULT;
+    }
+
+    put_page_from_l4e(ol4e, pfn, 0, 1);
+    return rc;
+}
+
+int pv_new_guest_cr3(unsigned long mfn)
+{
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    int rc;
+    unsigned long old_base_mfn;
+
+    if ( is_pv_32bit_domain(currd) )
+    {
+        unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
+        l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
+
+        rc = mod_l4_entry(pl4e,
+                          l4e_from_pfn(mfn,
+                                       (_PAGE_PRESENT | _PAGE_RW |
+                                        _PAGE_USER | _PAGE_ACCESSED)),
+                          gt_mfn, 0, curr);
+        unmap_domain_page(pl4e);
+        switch ( rc )
+        {
+        case 0:
+            break;
+        case -EINTR:
+        case -ERESTART:
+            return -ERESTART;
+        default:
+            gdprintk(XENLOG_WARNING,
+                     "Error while installing new compat baseptr %" PRI_mfn "\n",
+                     mfn);
+            return rc;
+        }
+
+        pv_invalidate_shadow_ldt(curr, false);
+        write_ptbase(curr);
+
+        return 0;
+    }
+
+    rc = put_old_guest_table(curr);
+    if ( unlikely(rc) )
+        return rc;
+
+    old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
+    /*
+     * This is particularly important when getting restarted after the
+     * previous attempt got preempted in the put-old-MFN phase.
+     */
+    if ( old_base_mfn == mfn )
+    {
+        write_ptbase(curr);
+        return 0;
+    }
+
+    rc = paging_mode_refcounts(currd)
+         ? (get_page_from_mfn(_mfn(mfn), currd) ? 0 : -EINVAL)
+         : get_page_and_type_from_mfn(_mfn(mfn), PGT_root_page_table,
+                                      currd, 0, true);
+    switch ( rc )
+    {
+    case 0:
+        break;
+    case -EINTR:
+    case -ERESTART:
+        return -ERESTART;
+    default:
+        gdprintk(XENLOG_WARNING,
+                 "Error while installing new baseptr %" PRI_mfn "\n", mfn);
+        return rc;
+    }
+
+    pv_invalidate_shadow_ldt(curr, false);
+
+    if ( !VM_ASSIST(currd, m2p_strict) && !paging_mode_refcounts(currd) )
+        fill_ro_mpt(mfn);
+    curr->arch.guest_table = pagetable_from_pfn(mfn);
+    update_cr3(curr);
+
+    write_ptbase(curr);
+
+    if ( likely(old_base_mfn != 0) )
+    {
+        struct page_info *page = mfn_to_page(old_base_mfn);
+
+        if ( paging_mode_refcounts(currd) )
+            put_page(page);
+        else
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+                rc = -ERESTART;
+                /* fallthrough */
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+    }
+
+    return rc;
+}
+
+long do_mmu_update(XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+                   unsigned int count, XEN_GUEST_HANDLE_PARAM(uint) pdone,
+                   unsigned int foreigndom)
+{
+    struct mmu_update req;
+    void *va;
+    unsigned long gpfn, gmfn, mfn;
+    struct page_info *page;
+    unsigned int cmd, i = 0, done = 0, pt_dom;
+    struct vcpu *curr = current, *v = curr;
+    struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+    struct domain_mmap_cache mapcache;
+    uint32_t xsm_needed = 0;
+    uint32_t xsm_checked = 0;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(ureqs)) )
+    {
+        /*
+         * See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below.
+         */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmu_update);
+
+    if ( unlikely(!guest_handle_okay(ureqs, count)) )
+        return -EFAULT;
+
+    if ( (pt_dom = foreigndom >> 16) != 0 )
+    {
+        /* Pagetables belong to a foreign domain (PFD). */
+        if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+            return -ESRCH;
+
+        if ( pt_owner == d )
+            rcu_unlock_domain(pt_owner);
+        else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+        {
+            rc = -EINVAL;
+            goto out;
+        }
+    }
+
+    if ( (pg_owner = get_pg_owner((uint16_t)foreigndom)) == NULL )
+    {
+        rc = -ESRCH;
+        goto out;
+    }
+
+    domain_mmap_cache_init(&mapcache);
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+        switch ( cmd )
+        {
+            /*
+             * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+             * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+             * current A/D bits.
+             */
+        case MMU_NORMAL_PT_UPDATE:
+        case MMU_PT_UPDATE_PRESERVE_AD:
+        {
+            p2m_type_t p2mt;
+
+            rc = -EOPNOTSUPP;
+            if ( unlikely(paging_mode_refcounts(pt_owner)) )
+                break;
+
+            xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+            if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+            {
+                xsm_needed |= XSM_MMU_UPDATE_READ;
+                if ( get_pte_flags(req.val) & _PAGE_RW )
+                    xsm_needed |= XSM_MMU_UPDATE_WRITE;
+            }
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+            rc = -EINVAL;
+
+            req.ptr -= cmd;
+            gmfn = req.ptr >> PAGE_SHIFT;
+            page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+            if ( p2m_is_paged(p2mt) )
+            {
+                ASSERT(!page);
+                p2m_mem_paging_populate(pg_owner, gmfn);
+                rc = -ENOENT;
+                break;
+            }
+
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for normal update\n");
+                break;
+            }
+
+            mfn = page_to_mfn(page);
+            va = map_domain_page_with_cache(mfn, &mapcache);
+            va = (void *)((unsigned long)va +
+                          (unsigned long)(req.ptr & ~PAGE_MASK));
+
+            if ( page_lock(page) )
+            {
+                switch ( page->u.inuse.type_info & PGT_type_mask )
+                {
+                case PGT_l1_page_table:
+                {
+                    l1_pgentry_t l1e = l1e_from_intpte(req.val);
+                    p2m_type_t l1e_p2mt = p2m_ram_rw;
+                    struct page_info *target = NULL;
+                    p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+                                        P2M_UNSHARE : P2M_ALLOC;
+
+                    if ( paging_mode_translate(pg_owner) )
+                        target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+                                                   &l1e_p2mt, q);
+
+                    if ( p2m_is_paged(l1e_p2mt) )
+                    {
+                        if ( target )
+                            put_page(target);
+                        p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+                        rc = -ENOENT;
+                        break;
+                    }
+                    else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+                    {
+                        rc = -ENOENT;
+                        break;
+                    }
+                    /* If we tried to unshare and failed */
+                    else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+                    {
+                        /* We could not have obtained a page ref. */
+                        ASSERT(target == NULL);
+                        /* And mem_sharing_notify has already been called. */
+                        rc = -ENOMEM;
+                        break;
+                    }
+
+                    rc = mod_l1_entry(va, l1e, mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+                                      pg_owner);
+                    if ( target )
+                        put_page(target);
+                }
+                break;
+                case PGT_l2_page_table:
+                    rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l3_page_table:
+                    rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    break;
+                case PGT_l4_page_table:
+                    rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+                                      cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                break;
+                case PGT_writable_page:
+                    perfc_incr(writable_mmu_updates);
+                    if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                        rc = 0;
+                    break;
+                }
+                page_unlock(page);
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+            }
+            else if ( get_page_type(page, PGT_writable_page) )
+            {
+                perfc_incr(writable_mmu_updates);
+                if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+                    rc = 0;
+                put_page_type(page);
+            }
+
+            unmap_domain_page_with_cache(va, &mapcache);
+            put_page(page);
+        }
+        break;
+
+        case MMU_MACHPHYS_UPDATE:
+            if ( unlikely(d != pt_owner) )
+            {
+                rc = -EPERM;
+                break;
+            }
+
+            if ( unlikely(paging_mode_translate(pg_owner)) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            mfn = req.ptr >> PAGE_SHIFT;
+            gpfn = req.val;
+
+            xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+            if ( xsm_needed != xsm_checked )
+            {
+                rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+                if ( rc )
+                    break;
+                xsm_checked = xsm_needed;
+            }
+
+            if ( unlikely(!get_page_from_mfn(_mfn(mfn), pg_owner)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Could not get page for mach->phys update\n");
+                rc = -EINVAL;
+                break;
+            }
+
+            set_gpfn_from_mfn(mfn, gpfn);
+
+            paging_mark_dirty(pg_owner, _mfn(mfn));
+
+            put_page(mfn_to_page(mfn));
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(ureqs, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmu_update, "hihi",
+            ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmu_update, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    domain_mmap_cache_destroy(&mapcache);
+
+    perfc_add(num_page_updates, i);
+
+ out:
+    if ( pt_owner != d )
+        rcu_unlock_domain(pt_owner);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+static int __do_update_va_mapping(unsigned long va, u64 val64,
+                                  unsigned long flags,
+                                  struct domain *pg_owner)
+{
+    l1_pgentry_t   val = l1e_from_intpte(val64);
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct page_info *gl1pg;
+    l1_pgentry_t  *pl1e;
+    unsigned long  bmap_ptr, gl1mfn;
+    cpumask_t     *mask = NULL;
+    int            rc;
+
+    perfc_incr(calls_to_update_va);
+
+    rc = xsm_update_va_mapping(XSM_TARGET, currd, pg_owner, val);
+    if ( rc )
+        return rc;
+
+    rc = -EINVAL;
+    pl1e = pv_map_guest_l1e(va, &gl1mfn);
+    if ( unlikely(!pl1e || !get_page_from_mfn(_mfn(gl1mfn), currd)) )
+        goto out;
+
+    gl1pg = mfn_to_page(gl1mfn);
+    if ( !page_lock(gl1pg) )
+    {
+        put_page(gl1pg);
+        goto out;
+    }
+
+    if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+    {
+        page_unlock(gl1pg);
+        put_page(gl1pg);
+        goto out;
+    }
+
+    rc = mod_l1_entry(pl1e, val, gl1mfn, 0, curr, pg_owner);
+
+    page_unlock(gl1pg);
+    put_page(gl1pg);
+
+ out:
+    if ( pl1e )
+        pv_unmap_guest_l1e(pl1e);
+
+    switch ( flags & UVMF_FLUSHTYPE_MASK )
+    {
+    case UVMF_TLB_FLUSH:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            flush_tlb_local();
+            break;
+        case UVMF_ALL:
+            mask = currd->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(currd,
+                                      const_guest_handle_from_ptr(bmap_ptr, void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_mask(mask);
+        break;
+
+    case UVMF_INVLPG:
+        switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+        {
+        case UVMF_LOCAL:
+            paging_invlpg(curr, va);
+            break;
+        case UVMF_ALL:
+            mask = currd->domain_dirty_cpumask;
+            break;
+        default:
+            mask = this_cpu(scratch_cpumask);
+            rc = vcpumask_to_pcpumask(currd,
+                                      const_guest_handle_from_ptr(bmap_ptr, void),
+                                      mask);
+            break;
+        }
+        if ( mask )
+            flush_tlb_one_mask(mask, va);
+        break;
+    }
+
+    return rc;
+}
+
+long do_update_va_mapping(unsigned long va, u64 val64,
+                          unsigned long flags)
+{
+    return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
+                                      unsigned long flags,
+                                      domid_t domid)
+{
+    struct domain *pg_owner;
+    int rc;
+
+    if ( (pg_owner = get_pg_owner(domid)) == NULL )
+        return -ESRCH;
+
+    rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+    put_pg_owner(pg_owner);
+
+    return rc;
+}
+
+long do_mmuext_op(XEN_GUEST_HANDLE_PARAM(mmuext_op_t) uops,
+                  unsigned int count,
+                  XEN_GUEST_HANDLE_PARAM(uint) pdone,
+                  unsigned int foreigndom)
+{
+    struct mmuext_op op;
+    unsigned long type;
+    unsigned int i, done = 0;
+    struct vcpu *curr = current;
+    struct domain *currd = curr->domain;
+    struct domain *pg_owner;
+    int rc = put_old_guest_table(curr);
+
+    if ( unlikely(rc) )
+    {
+        if ( likely(rc == -ERESTART) )
+            rc = hypercall_create_continuation(
+                     __HYPERVISOR_mmuext_op, "hihi", uops, count, pdone,
+                     foreigndom);
+        return rc;
+    }
+
+    if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+         likely(guest_handle_is_null(uops)) )
+    {
+        /*
+         * See the curr->arch.old_guest_table related
+         * hypercall_create_continuation() below.
+         */
+        return (int)foreigndom;
+    }
+
+    if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+    {
+        count &= ~MMU_UPDATE_PREEMPTED;
+        if ( unlikely(!guest_handle_is_null(pdone)) )
+            (void)copy_from_guest(&done, pdone, 1);
+    }
+    else
+        perfc_incr(calls_to_mmuext_op);
+
+    if ( unlikely(!guest_handle_okay(uops, count)) )
+        return -EFAULT;
+
+    if ( (pg_owner = get_pg_owner(foreigndom)) == NULL )
+        return -ESRCH;
+
+    if ( !is_pv_domain(pg_owner) )
+    {
+        put_pg_owner(pg_owner);
+        return -EINVAL;
+    }
+
+    rc = xsm_mmuext_op(XSM_TARGET, currd, pg_owner);
+    if ( rc )
+    {
+        put_pg_owner(pg_owner);
+        return rc;
+    }
+
+    for ( i = 0; i < count; i++ )
+    {
+        if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+        {
+            rc = -ERESTART;
+            break;
+        }
+
+        if ( unlikely(__copy_from_guest(&op, uops, 1) != 0) )
+        {
+            rc = -EFAULT;
+            break;
+        }
+
+        if ( is_hvm_domain(currd) )
+        {
+            switch ( op.cmd )
+            {
+            case MMUEXT_PIN_L1_TABLE:
+            case MMUEXT_PIN_L2_TABLE:
+            case MMUEXT_PIN_L3_TABLE:
+            case MMUEXT_PIN_L4_TABLE:
+            case MMUEXT_UNPIN_TABLE:
+                break;
+            default:
+                rc = -EOPNOTSUPP;
+                goto done;
+            }
+        }
+
+        rc = 0;
+
+        switch ( op.cmd )
+        {
+            struct page_info *page;
+            p2m_type_t p2mt;
+
+        case MMUEXT_PIN_L1_TABLE:
+            type = PGT_l1_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L2_TABLE:
+            type = PGT_l2_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L3_TABLE:
+            type = PGT_l3_page_table;
+            goto pin_page;
+
+        case MMUEXT_PIN_L4_TABLE:
+            if ( is_pv_32bit_domain(pg_owner) )
+                break;
+            type = PGT_l4_page_table;
+
+        pin_page:
+            /* Ignore pinning of invalid paging levels. */
+            if ( (op.cmd - MMUEXT_PIN_L1_TABLE) > (CONFIG_PAGING_LEVELS - 1) )
+                break;
+
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                rc = -EINVAL;
+                break;
+            }
+
+            rc = get_page_type_preemptible(page, type);
+            if ( unlikely(rc) )
+            {
+                if ( rc == -EINTR )
+                    rc = -ERESTART;
+                else if ( rc != -ERESTART )
+                    gdprintk(XENLOG_WARNING,
+                             "Error %d while pinning mfn %" PRI_mfn "\n",
+                            rc, page_to_mfn(page));
+                if ( page != curr->arch.old_guest_table )
+                    put_page(page);
+                break;
+            }
+
+            rc = xsm_memory_pin_page(XSM_HOOK, currd, pg_owner, page);
+            if ( !rc && unlikely(test_and_set_bit(_PGT_pinned,
+                                                  &page->u.inuse.type_info)) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " already pinned\n", page_to_mfn(page));
+                rc = -EINVAL;
+            }
+
+            if ( unlikely(rc) )
+                goto pin_drop;
+
+            /* A page is dirtied when its pin status is set. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            /* We can race domain destruction (domain_relinquish_resources). */
+            if ( unlikely(pg_owner != currd) )
+            {
+                bool drop_ref;
+
+                spin_lock(&pg_owner->page_alloc_lock);
+                drop_ref = (pg_owner->is_dying &&
+                            test_and_clear_bit(_PGT_pinned,
+                                               &page->u.inuse.type_info));
+                spin_unlock(&pg_owner->page_alloc_lock);
+                if ( drop_ref )
+                {
+        pin_drop:
+                    if ( type == PGT_l1_page_table )
+                        put_page_and_type(page);
+                    else
+                        curr->arch.old_guest_table = page;
+                }
+            }
+            break;
+
+        case MMUEXT_UNPIN_TABLE:
+            if ( paging_mode_refcounts(pg_owner) )
+                break;
+
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, NULL, P2M_ALLOC);
+            if ( unlikely(!page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " bad, or bad owner d%d\n",
+                         op.arg1.mfn, pg_owner->domain_id);
+                rc = -EINVAL;
+                break;
+            }
+
+            if ( !test_and_clear_bit(_PGT_pinned, &page->u.inuse.type_info) )
+            {
+                put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "mfn %" PRI_mfn " not pinned\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            switch ( rc = put_page_and_type_preemptible(page) )
+            {
+            case -EINTR:
+            case -ERESTART:
+                curr->arch.old_guest_table = page;
+                rc = 0;
+                break;
+            default:
+                BUG_ON(rc);
+                break;
+            }
+            put_page(page);
+
+            /* A page is dirtied when its pin status is cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+            break;
+
+        case MMUEXT_NEW_BASEPTR:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(currd)) )
+                rc = -EINVAL;
+            else
+                rc = pv_new_guest_cr3(op.arg1.mfn);
+            break;
+
+        case MMUEXT_NEW_USER_BASEPTR: {
+            unsigned long old_mfn;
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(paging_mode_translate(currd)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            old_mfn = pagetable_get_pfn(curr->arch.guest_table_user);
+            /*
+             * This is particularly important when getting restarted after the
+             * previous attempt got preempted in the put-old-MFN phase.
+             */
+            if ( old_mfn == op.arg1.mfn )
+                break;
+
+            if ( op.arg1.mfn != 0 )
+            {
+                rc = get_page_and_type_from_mfn(
+                    _mfn(op.arg1.mfn), PGT_root_page_table, currd, 0, true);
+
+                if ( unlikely(rc) )
+                {
+                    if ( rc == -EINTR )
+                        rc = -ERESTART;
+                    else if ( rc != -ERESTART )
+                        gdprintk(XENLOG_WARNING,
+                                 "Error %d installing new mfn %" PRI_mfn "\n",
+                                 rc, op.arg1.mfn);
+                    break;
+                }
+
+                if ( VM_ASSIST(currd, m2p_strict) )
+                    zap_ro_mpt(op.arg1.mfn);
+            }
+
+            curr->arch.guest_table_user = pagetable_from_pfn(op.arg1.mfn);
+
+            if ( old_mfn != 0 )
+            {
+                page = mfn_to_page(old_mfn);
+
+                switch ( rc = put_page_and_type_preemptible(page) )
+                {
+                case -EINTR:
+                    rc = -ERESTART;
+                    /* fallthrough */
+                case -ERESTART:
+                    curr->arch.old_guest_table = page;
+                    break;
+                default:
+                    BUG_ON(rc);
+                    break;
+                }
+            }
+
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_LOCAL:
+            if ( likely(currd == pg_owner) )
+                flush_tlb_local();
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_LOCAL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else
+                paging_invlpg(curr, op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_TLB_FLUSH_MULTI:
+        case MMUEXT_INVLPG_MULTI:
+        {
+            cpumask_t *mask = this_cpu(scratch_cpumask);
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(vcpumask_to_pcpumask(currd,
+                                   guest_handle_to_param(op.arg2.vcpumask,
+                                                         const_void),
+                                   mask)) )
+                rc = -EINVAL;
+            if ( unlikely(rc) )
+                break;
+
+            if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
+                flush_tlb_mask(mask);
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            break;
+        }
+
+        case MMUEXT_TLB_FLUSH_ALL:
+            if ( likely(currd == pg_owner) )
+                flush_tlb_mask(currd->domain_dirty_cpumask);
+            else
+                rc = -EPERM;
+            break;
+
+        case MMUEXT_INVLPG_ALL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( __addr_ok(op.arg1.linear_addr) )
+                flush_tlb_one_mask(currd->domain_dirty_cpumask,
+                                   op.arg1.linear_addr);
+            break;
+
+        case MMUEXT_FLUSH_CACHE:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( unlikely(!cache_flush_permitted(currd)) )
+                rc = -EACCES;
+            else
+                wbinvd();
+            break;
+
+        case MMUEXT_FLUSH_CACHE_GLOBAL:
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( likely(cache_flush_permitted(currd)) )
+            {
+                unsigned int cpu;
+                cpumask_t *mask = this_cpu(scratch_cpumask);
+
+                cpumask_clear(mask);
+                for_each_online_cpu(cpu)
+                    if ( !cpumask_intersects(mask,
+                                             per_cpu(cpu_sibling_mask, cpu)) )
+                        __cpumask_set_cpu(cpu, mask);
+                flush_mask(mask, FLUSH_CACHE);
+            }
+            else
+                rc = -EINVAL;
+            break;
+
+        case MMUEXT_SET_LDT:
+        {
+            unsigned int ents = op.arg2.nr_ents;
+            unsigned long ptr = ents ? op.arg1.linear_addr : 0;
+
+            if ( unlikely(currd != pg_owner) )
+                rc = -EPERM;
+            else if ( paging_mode_external(currd) )
+                rc = -EINVAL;
+            else if ( ((ptr & (PAGE_SIZE - 1)) != 0) || !__addr_ok(ptr) ||
+                      (ents > 8192) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Bad args to SET_LDT: ptr=%lx, ents=%x\n", ptr, ents);
+                rc = -EINVAL;
+            }
+            else if ( (curr->arch.pv_vcpu.ldt_ents != ents) ||
+                      (curr->arch.pv_vcpu.ldt_base != ptr) )
+            {
+                pv_invalidate_shadow_ldt(curr, false);
+                flush_tlb_local();
+                curr->arch.pv_vcpu.ldt_base = ptr;
+                curr->arch.pv_vcpu.ldt_ents = ents;
+                load_LDT(curr);
+            }
+            break;
+        }
+
+        case MMUEXT_CLEAR_PAGE:
+            page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt, P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && page )
+            {
+                put_page(page);
+                page = NULL;
+            }
+            if ( !page || !get_page_type(page, PGT_writable_page) )
+            {
+                if ( page )
+                    put_page(page);
+                gdprintk(XENLOG_WARNING,
+                         "Error clearing mfn %" PRI_mfn "\n", op.arg1.mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            /* A page is dirtied when it's being cleared. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(page)));
+
+            clear_domain_page(_mfn(page_to_mfn(page)));
+
+            put_page_and_type(page);
+            break;
+
+        case MMUEXT_COPY_PAGE:
+        {
+            struct page_info *src_page, *dst_page;
+
+            src_page = get_page_from_gfn(pg_owner, op.arg2.src_mfn, &p2mt,
+                                         P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && src_page )
+            {
+                put_page(src_page);
+                src_page = NULL;
+            }
+            if ( unlikely(!src_page) )
+            {
+                gdprintk(XENLOG_WARNING,
+                         "Error copying from mfn %" PRI_mfn "\n",
+                         op.arg2.src_mfn);
+                rc = -EINVAL;
+                break;
+            }
+
+            dst_page = get_page_from_gfn(pg_owner, op.arg1.mfn, &p2mt,
+                                         P2M_ALLOC);
+            if ( unlikely(p2mt != p2m_ram_rw) && dst_page )
+            {
+                put_page(dst_page);
+                dst_page = NULL;
+            }
+            rc = (dst_page &&
+                  get_page_type(dst_page, PGT_writable_page)) ? 0 : -EINVAL;
+            if ( unlikely(rc) )
+            {
+                put_page(src_page);
+                if ( dst_page )
+                    put_page(dst_page);
+                gdprintk(XENLOG_WARNING,
+                         "Error copying to mfn %" PRI_mfn "\n", op.arg1.mfn);
+                break;
+            }
+
+            /* A page is dirtied when it's being copied to. */
+            paging_mark_dirty(pg_owner, _mfn(page_to_mfn(dst_page)));
+
+            copy_domain_page(_mfn(page_to_mfn(dst_page)),
+                             _mfn(page_to_mfn(src_page)));
+
+            put_page_and_type(dst_page);
+            put_page(src_page);
+            break;
+        }
+
+        case MMUEXT_MARK_SUPER:
+        case MMUEXT_UNMARK_SUPER:
+            rc = -EOPNOTSUPP;
+            break;
+
+        default:
+            rc = -ENOSYS;
+            break;
+        }
+
+ done:
+        if ( unlikely(rc) )
+            break;
+
+        guest_handle_add_offset(uops, 1);
+    }
+
+    if ( rc == -ERESTART )
+    {
+        ASSERT(i < count);
+        rc = hypercall_create_continuation(
+            __HYPERVISOR_mmuext_op, "hihi",
+            uops, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+    }
+    else if ( curr->arch.old_guest_table )
+    {
+        XEN_GUEST_HANDLE_PARAM(void) null;
+
+        ASSERT(rc || i == count);
+        set_xen_guest_handle(null, NULL);
+        /*
+         * In order to have a way to communicate the final return value to
+         * our continuation, we pass this in place of "foreigndom", building
+         * on the fact that this argument isn't needed anymore.
+         */
+        rc = hypercall_create_continuation(
+                __HYPERVISOR_mmuext_op, "hihi", null,
+                MMU_UPDATE_PREEMPTED, null, rc);
+    }
+
+    put_pg_owner(pg_owner);
+
+    perfc_add(num_mmuext_ops, i);
+
+    /* Add incremental work we have done to the @done output parameter. */
+    if ( unlikely(!guest_handle_is_null(pdone)) )
+    {
+        done += i;
+        copy_to_guest(pdone, &done, 1);
+    }
+
+    return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 08/11] x86/mm: remove the now unused inclusion of pv/mm.h
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (6 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 07/11] x86/mm: move PV hypercalls to pv/mm-hypercalls.c Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 09/11] x86/mm: use put_page_type_preemptible in put_page_from_l{2, 3}e Wei Liu
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index d232076459..167b318260 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -127,8 +127,6 @@
 #include <asm/pv/grant_table.h>
 #include <asm/pv/mm.h>
 
-#include "pv/mm.h"
-
 /* Mapping of the fixmap space needed early. */
 l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
     l1_fixmap[L1_PAGETABLE_ENTRIES];
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 09/11] x86/mm: use put_page_type_preemptible in put_page_from_l{2, 3}e
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (7 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 08/11] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 10/11] x86/mm: move {get, put}_page_from_l{2, 3, 4}e Wei Liu
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 167b318260..40fb761d08 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1050,8 +1050,6 @@ int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
     return 0;
 }
 
-static int __put_page_type(struct page_info *, int preemptible);
-
 int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
                       bool defer)
 {
@@ -1078,7 +1076,7 @@ int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
     if ( unlikely(partial > 0) )
     {
         ASSERT(!defer);
-        return __put_page_type(pg, 1);
+        return put_page_type_preemptible(pg);
     }
 
     if ( defer )
@@ -1101,7 +1099,7 @@ int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
         if ( unlikely(partial > 0) )
         {
             ASSERT(!defer);
-            return __put_page_type(pg, 1);
+            return put_page_type_preemptible(pg);
         }
 
         if ( defer )
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 10/11] x86/mm: move {get, put}_page_from_l{2, 3, 4}e
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (8 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 09/11] x86/mm: use put_page_type_preemptible in put_page_from_l{2, 3}e Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-30 15:43   ` [PATCH v3 extra 11/11] x86/mm: move description of x86 page table API to pv/mm.c Wei Liu
  2017-07-31  9:58   ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Jan Beulich
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

They are only used by PV code.

Fix coding style issues while moving. Move declarations to PV specific
header file.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c           | 253 --------------------------------------------
 xen/arch/x86/pv/mm.c        | 246 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/mm.h    |  10 --
 xen/include/asm-x86/pv/mm.h |  29 +++++
 4 files changed, 275 insertions(+), 263 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 40fb761d08..ade3ed2c48 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -511,72 +511,6 @@ int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
     return rc;
 }
 
-static void put_data_page(
-    struct page_info *page, int writeable)
-{
-    if ( writeable )
-        put_page_and_type(page);
-    else
-        put_page(page);
-}
-
-/*
- * We allow root tables to map each other (a.k.a. linear page tables). It
- * needs some special care with reference counts and access permissions:
- *  1. The mapping entry must be read-only, or the guest may get write access
- *     to its own PTEs.
- *  2. We must only bump the reference counts for an *already validated*
- *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
- *     on a validation that is required to complete that validation.
- *  3. We only need to increment the reference counts for the mapped page
- *     frame if it is mapped by a different root table. This is sufficient and
- *     also necessary to allow validation of a root table mapping itself.
- */
-#define define_get_linear_pagetable(level)                                  \
-static int                                                                  \
-get_##level##_linear_pagetable(                                             \
-    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
-{                                                                           \
-    unsigned long x, y;                                                     \
-    struct page_info *page;                                                 \
-    unsigned long pfn;                                                      \
-                                                                            \
-    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
-    {                                                                       \
-        gdprintk(XENLOG_WARNING,                                            \
-                 "Attempt to create linear p.t. with write perms\n");       \
-        return 0;                                                           \
-    }                                                                       \
-                                                                            \
-    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
-    {                                                                       \
-        /* Make sure the mapped frame belongs to the correct domain. */     \
-        if ( unlikely(!get_page_from_mfn(_mfn(pfn), d)) )                   \
-            return 0;                                                       \
-                                                                            \
-        /*                                                                  \
-         * Ensure that the mapped frame is an already-validated page table. \
-         * If so, atomically increment the count (checking for overflow).   \
-         */                                                                 \
-        page = mfn_to_page(pfn);                                            \
-        y = page->u.inuse.type_info;                                        \
-        do {                                                                \
-            x = y;                                                          \
-            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
-                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
-                          (PGT_##level##_page_table|PGT_validated)) )       \
-            {                                                               \
-                put_page(page);                                             \
-                return 0;                                                   \
-            }                                                               \
-        }                                                                   \
-        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
-    }                                                                       \
-                                                                            \
-    return 1;                                                               \
-}
-
-
 bool is_iomem_page(mfn_t mfn)
 {
     struct page_info *page;
@@ -866,108 +800,6 @@ get_page_from_l1e(
 }
 
 
-/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
-/*
- * get_page_from_l2e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l2);
-int
-get_page_from_l2e(
-    l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
-{
-    unsigned long mfn = l2e_get_pfn(l2e);
-    int rc;
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
-                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
-    {
-        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0,
-                                        false);
-        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
-            rc = 0;
-        return rc;
-    }
-
-    return -EINVAL;
-}
-
-
-/*
- * get_page_from_l3e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l3);
-int
-get_page_from_l3e(
-    l3_pgentry_t l3e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
-                 l3e_get_flags(l3e) & l3_disallow_mask(d));
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_mfn(_mfn(l3e_get_pfn(l3e)), PGT_l2_page_table,
-                                    d, partial, true);
-    if ( unlikely(rc == -EINVAL) &&
-         !is_pv_32bit_domain(d) &&
-         get_l3_linear_pagetable(l3e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
-/*
- * get_page_from_l4e returns:
- *   1 => page not present
- *   0 => success
- *  <0 => error code
- */
-define_get_linear_pagetable(l4);
-int
-get_page_from_l4e(
-    l4_pgentry_t l4e, unsigned long pfn, struct domain *d, int partial)
-{
-    int rc;
-
-    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
-        return 1;
-
-    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
-    {
-        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
-                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
-        return -EINVAL;
-    }
-
-    rc = get_page_and_type_from_mfn(_mfn(l4e_get_pfn(l4e)), PGT_l3_page_table,
-                                    d, partial, true);
-    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
-        rc = 0;
-
-    return rc;
-}
-
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 {
     unsigned long     pfn = l1e_get_pfn(l1e);
@@ -1028,91 +860,6 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 }
 
 
-/*
- * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
- * Note also that this automatically deals correctly with linear p.t.'s.
- */
-int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
-{
-    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
-        return 1;
-
-    if ( l2e_get_flags(l2e) & _PAGE_PSE )
-    {
-        struct page_info *page = mfn_to_page(l2e_get_pfn(l2e));
-        unsigned int i;
-
-        for ( i = 0; i < (1u << PAGETABLE_ORDER); i++, page++ )
-            put_page_and_type(page);
-    } else
-        put_page_and_type(l2e_get_page(l2e));
-
-    return 0;
-}
-
-int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
-                      bool defer)
-{
-    struct page_info *pg;
-
-    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
-        return 1;
-
-    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
-    {
-        unsigned long mfn = l3e_get_pfn(l3e);
-        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
-
-        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
-        do {
-            put_data_page(mfn_to_page(mfn), writeable);
-        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
-
-        return 0;
-    }
-
-    pg = l3e_get_page(l3e);
-
-    if ( unlikely(partial > 0) )
-    {
-        ASSERT(!defer);
-        return put_page_type_preemptible(pg);
-    }
-
-    if ( defer )
-    {
-        current->arch.old_guest_table = pg;
-        return 0;
-    }
-
-    return put_page_and_type_preemptible(pg);
-}
-
-int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
-                      bool defer)
-{
-    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
-         (l4e_get_pfn(l4e) != pfn) )
-    {
-        struct page_info *pg = l4e_get_page(l4e);
-
-        if ( unlikely(partial > 0) )
-        {
-            ASSERT(!defer);
-            return put_page_type_preemptible(pg);
-        }
-
-        if ( defer )
-        {
-            current->arch.old_guest_table = pg;
-            return 0;
-        }
-
-        return put_page_and_type_preemptible(pg);
-    }
-    return 1;
-}
-
 bool fill_ro_mpt(unsigned long mfn)
 {
     l4_pgentry_t *l4tab = map_domain_page(_mfn(mfn));
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index 19b2ae588e..ad35808c51 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -777,6 +777,252 @@ void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush)
     spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
 }
 
+/*
+ * We allow root tables to map each other (a.k.a. linear page tables). It
+ * needs some special care with reference counts and access permissions:
+ *  1. The mapping entry must be read-only, or the guest may get write access
+ *     to its own PTEs.
+ *  2. We must only bump the reference counts for an *already validated*
+ *     L2 table, or we can end up in a deadlock in get_page_type() by waiting
+ *     on a validation that is required to complete that validation.
+ *  3. We only need to increment the reference counts for the mapped page
+ *     frame if it is mapped by a different root table. This is sufficient and
+ *     also necessary to allow validation of a root table mapping itself.
+ */
+#define define_get_linear_pagetable(level)                                  \
+static int                                                                  \
+get_##level##_linear_pagetable(                                             \
+    level##_pgentry_t pde, unsigned long pde_pfn, struct domain *d)         \
+{                                                                           \
+    unsigned long x, y;                                                     \
+    struct page_info *page;                                                 \
+    unsigned long pfn;                                                      \
+                                                                            \
+    if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
+    {                                                                       \
+        gdprintk(XENLOG_WARNING,                                            \
+                 "Attempt to create linear p.t. with write perms\n");       \
+        return 0;                                                           \
+    }                                                                       \
+                                                                            \
+    if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
+    {                                                                       \
+        /* Make sure the mapped frame belongs to the correct domain. */     \
+        if ( unlikely(!get_page_from_mfn(_mfn(pfn), d)) )                   \
+            return 0;                                                       \
+                                                                            \
+        /*                                                                  \
+         * Ensure that the mapped frame is an already-validated page table. \
+         * If so, atomically increment the count (checking for overflow).   \
+         */                                                                 \
+        page = mfn_to_page(pfn);                                            \
+        y = page->u.inuse.type_info;                                        \
+        do {                                                                \
+            x = y;                                                          \
+            if ( unlikely((x & PGT_count_mask) == PGT_count_mask) ||        \
+                 unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
+                          (PGT_##level##_page_table|PGT_validated)) )       \
+            {                                                               \
+                put_page(page);                                             \
+                return 0;                                                   \
+            }                                                               \
+        }                                                                   \
+        while ( (y = cmpxchg(&page->u.inuse.type_info, x, x + 1)) != x );   \
+    }                                                                       \
+                                                                            \
+    return 1;                                                               \
+}
+
+/* NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'. */
+/*
+ * get_page_from_l2e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l2);
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d)
+{
+    unsigned long mfn = l2e_get_pfn(l2e);
+    int rc;
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l2e_get_flags(l2e) & L2_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L2 flags %x\n",
+                 l2e_get_flags(l2e) & L2_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    if ( !(l2e_get_flags(l2e) & _PAGE_PSE) )
+    {
+        rc = get_page_and_type_from_mfn(_mfn(mfn), PGT_l1_page_table, d, 0,
+                                        false);
+        if ( unlikely(rc == -EINVAL) && get_l2_linear_pagetable(l2e, pfn, d) )
+            rc = 0;
+        return rc;
+    }
+
+    return -EINVAL;
+}
+
+/*
+ * get_page_from_l3e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l3);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial)
+{
+    int rc;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l3e_get_flags(l3e) & l3_disallow_mask(d))) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L3 flags %x\n",
+                 l3e_get_flags(l3e) & l3_disallow_mask(d));
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_mfn(_mfn(l3e_get_pfn(l3e)), PGT_l2_page_table,
+                                    d, partial, true);
+    if ( unlikely(rc == -EINVAL) &&
+         !is_pv_32bit_domain(d) &&
+         get_l3_linear_pagetable(l3e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+/*
+ * get_page_from_l4e returns:
+ *   1 => page not present
+ *   0 => success
+ *  <0 => error code
+ */
+define_get_linear_pagetable(l4);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial)
+{
+    int rc;
+
+    if ( !(l4e_get_flags(l4e) & _PAGE_PRESENT) )
+        return 1;
+
+    if ( unlikely((l4e_get_flags(l4e) & L4_DISALLOW_MASK)) )
+    {
+        gdprintk(XENLOG_WARNING, "Bad L4 flags %x\n",
+                 l4e_get_flags(l4e) & L4_DISALLOW_MASK);
+        return -EINVAL;
+    }
+
+    rc = get_page_and_type_from_mfn(_mfn(l4e_get_pfn(l4e)), PGT_l3_page_table,
+                                    d, partial, true);
+    if ( unlikely(rc == -EINVAL) && get_l4_linear_pagetable(l4e, pfn, d) )
+        rc = 0;
+
+    return rc;
+}
+
+/*
+ * NB. Virtual address 'l2e' maps to a machine address within frame 'pfn'.
+ * Note also that this automatically deals correctly with linear p.t.'s.
+ */
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+{
+    if ( !(l2e_get_flags(l2e) & _PAGE_PRESENT) || (l2e_get_pfn(l2e) == pfn) )
+        return 1;
+
+    if ( l2e_get_flags(l2e) & _PAGE_PSE )
+    {
+        struct page_info *page = mfn_to_page(l2e_get_pfn(l2e));
+        unsigned int i;
+
+        for ( i = 0; i < (1u << PAGETABLE_ORDER); i++, page++ )
+            put_page_and_type(page);
+    } else
+        put_page_and_type(l2e_get_page(l2e));
+
+    return 0;
+}
+
+static void put_data_page(struct page_info *page, bool writeable)
+{
+    if ( writeable )
+        put_page_and_type(page);
+    else
+        put_page(page);
+}
+
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer)
+{
+    struct page_info *pg;
+
+    if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
+        return 1;
+
+    if ( unlikely(l3e_get_flags(l3e) & _PAGE_PSE) )
+    {
+        unsigned long mfn = l3e_get_pfn(l3e);
+        int writeable = l3e_get_flags(l3e) & _PAGE_RW;
+
+        ASSERT(!(mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1)));
+        do {
+            put_data_page(mfn_to_page(mfn), writeable);
+        } while ( ++mfn & ((1UL << (L3_PAGETABLE_SHIFT - PAGE_SHIFT)) - 1) );
+
+        return 0;
+    }
+
+    pg = l3e_get_page(l3e);
+
+    if ( unlikely(partial > 0) )
+    {
+        ASSERT(!defer);
+        return put_page_type_preemptible(pg);
+    }
+
+    if ( defer )
+    {
+        current->arch.old_guest_table = pg;
+        return 0;
+    }
+
+    return put_page_and_type_preemptible(pg);
+}
+
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer)
+{
+    if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) &&
+         (l4e_get_pfn(l4e) != pfn) )
+    {
+        struct page_info *pg = l4e_get_page(l4e);
+
+        if ( unlikely(partial > 0) )
+        {
+            ASSERT(!defer);
+            return put_page_type_preemptible(pg);
+        }
+
+        if ( defer )
+        {
+            current->arch.old_guest_table = pg;
+            return 0;
+        }
+
+        return put_page_and_type_preemptible(pg);
+    }
+    return 1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 7480341240..4eeaf709c1 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -358,16 +358,6 @@ int  put_old_guest_table(struct vcpu *);
 int  get_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner,
                        struct domain *pg_owner);
 void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner);
-int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d);
-int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn);
-int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
-                      int partial);
-int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
-                      bool defer);
-int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
-                      int partial);
-int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
-                      bool defer);
 void get_page_light(struct page_info *page);
 bool get_page_from_mfn(mfn_t mfn, struct domain *d);
 int get_page_and_type_from_mfn(mfn_t mfn, unsigned long type, struct domain *d,
diff --git a/xen/include/asm-x86/pv/mm.h b/xen/include/asm-x86/pv/mm.h
index 664d7c3868..fb6dbb97ee 100644
--- a/xen/include/asm-x86/pv/mm.h
+++ b/xen/include/asm-x86/pv/mm.h
@@ -103,6 +103,17 @@ int pv_free_page_type(struct page_info *page, unsigned long type,
 
 void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush);
 
+int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn, struct domain *d);
+int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn);
+int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn, int partial,
+                      bool defer);
+int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, struct domain *d,
+                      int partial);
+int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn, int partial,
+                      bool defer);
+
 #else
 
 #include <xen/errno.h>
@@ -142,6 +153,24 @@ static inline int pv_free_page_type(struct page_info *page, unsigned long type,
 
 static inline void pv_invalidate_shadow_ldt(struct vcpu *v, bool flush) {}
 
+static inline int get_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn,
+                                    struct domain *d)
+{ return -EINVAL; }
+static inline int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+{ return -EINVAL; }
+static inline int get_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                                    struct domain *d, int partial)
+{ return -EINVAL; }
+static inline int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                                    int partial, bool defer)
+{ return -EINVAL; }
+static inline int get_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                                    struct domain *d, int partial)
+{ return -EINVAL; }
+static inline int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                                    int partial, bool defer)
+{ return -EINVAL; }
+
 #endif
 
 #endif /* __X86_PV_MM_H__ */
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v3 extra 11/11] x86/mm: move description of x86 page table API to pv/mm.c
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (9 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 10/11] x86/mm: move {get, put}_page_from_l{2, 3, 4}e Wei Liu
@ 2017-07-30 15:43   ` Wei Liu
  2017-07-31  9:58   ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Jan Beulich
  11 siblings, 0 replies; 39+ messages in thread
From: Wei Liu @ 2017-07-30 15:43 UTC (permalink / raw)
  To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/x86/mm.c    | 65 ----------------------------------------------------
 xen/arch/x86/pv/mm.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ade3ed2c48..75c84d2275 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -18,71 +18,6 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * A description of the x86 page table API:
- *
- * Domains trap to do_mmu_update with a list of update requests.
- * This is a list of (ptr, val) pairs, where the requested operation
- * is *ptr = val.
- *
- * Reference counting of pages:
- * ----------------------------
- * Each page has two refcounts: tot_count and type_count.
- *
- * TOT_COUNT is the obvious reference count. It counts all uses of a
- * physical page frame by a domain, including uses as a page directory,
- * a page table, or simple mappings via a PTE. This count prevents a
- * domain from releasing a frame back to the free pool when it still holds
- * a reference to it.
- *
- * TYPE_COUNT is more subtle. A frame can be put to one of three
- * mutually-exclusive uses: it might be used as a page directory, or a
- * page table, or it may be mapped writable by the domain [of course, a
- * frame may not be used in any of these three ways!].
- * So, type_count is a count of the number of times a frame is being
- * referred to in its current incarnation. Therefore, a page can only
- * change its type when its type count is zero.
- *
- * Pinning the page type:
- * ----------------------
- * The type of a page can be pinned/unpinned with the commands
- * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
- * pinning is not reference counted, so it can't be nested).
- * This is useful to prevent a page's type count falling to zero, at which
- * point safety checks would need to be carried out next time the count
- * is increased again.
- *
- * A further note on writable page mappings:
- * -----------------------------------------
- * For simplicity, the count of writable mappings for a page may not
- * correspond to reality. The 'writable count' is incremented for every
- * PTE which maps the page with the _PAGE_RW flag set. However, for
- * write access to be possible the page directory entry must also have
- * its _PAGE_RW bit set. We do not check this as it complicates the
- * reference counting considerably [consider the case of multiple
- * directory entries referencing a single page table, some with the RW
- * bit set, others not -- it starts getting a bit messy].
- * In normal use, this simplification shouldn't be a problem.
- * However, the logic can be added if required.
- *
- * One more note on read-only page mappings:
- * -----------------------------------------
- * We want domains to be able to map pages for read-only access. The
- * main reason is that page tables and directories should be readable
- * by a domain, but it would not be safe for them to be writable.
- * However, domains have free access to rings 1 & 2 of the Intel
- * privilege model. In terms of page protection, these are considered
- * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
- * read-only restrictions are respected in supervisor mode -- if the
- * bit is clear then any mapped page is writable.
- *
- * We get round this by always setting the WP bit and disallowing
- * updates to it. This is very unlikely to cause a problem for guest
- * OS's, which will generally use the WP bit to simplify copy-on-write
- * implementation (in that case, OS wants a fault when it writes to
- * an application-supplied buffer).
- */
-
 #include <xen/init.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
index ad35808c51..39e6a3bc9a 100644
--- a/xen/arch/x86/pv/mm.c
+++ b/xen/arch/x86/pv/mm.c
@@ -20,6 +20,71 @@
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+/*
+ * A description of the x86 page table API:
+ *
+ * Domains trap to do_mmu_update with a list of update requests.
+ * This is a list of (ptr, val) pairs, where the requested operation
+ * is *ptr = val.
+ *
+ * Reference counting of pages:
+ * ----------------------------
+ * Each page has two refcounts: tot_count and type_count.
+ *
+ * TOT_COUNT is the obvious reference count. It counts all uses of a
+ * physical page frame by a domain, including uses as a page directory,
+ * a page table, or simple mappings via a PTE. This count prevents a
+ * domain from releasing a frame back to the free pool when it still holds
+ * a reference to it.
+ *
+ * TYPE_COUNT is more subtle. A frame can be put to one of three
+ * mutually-exclusive uses: it might be used as a page directory, or a
+ * page table, or it may be mapped writable by the domain [of course, a
+ * frame may not be used in any of these three ways!].
+ * So, type_count is a count of the number of times a frame is being
+ * referred to in its current incarnation. Therefore, a page can only
+ * change its type when its type count is zero.
+ *
+ * Pinning the page type:
+ * ----------------------
+ * The type of a page can be pinned/unpinned with the commands
+ * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
+ * pinning is not reference counted, so it can't be nested).
+ * This is useful to prevent a page's type count falling to zero, at which
+ * point safety checks would need to be carried out next time the count
+ * is increased again.
+ *
+ * A further note on writable page mappings:
+ * -----------------------------------------
+ * For simplicity, the count of writable mappings for a page may not
+ * correspond to reality. The 'writable count' is incremented for every
+ * PTE which maps the page with the _PAGE_RW flag set. However, for
+ * write access to be possible the page directory entry must also have
+ * its _PAGE_RW bit set. We do not check this as it complicates the
+ * reference counting considerably [consider the case of multiple
+ * directory entries referencing a single page table, some with the RW
+ * bit set, others not -- it starts getting a bit messy].
+ * In normal use, this simplification shouldn't be a problem.
+ * However, the logic can be added if required.
+ *
+ * One more note on read-only page mappings:
+ * -----------------------------------------
+ * We want domains to be able to map pages for read-only access. The
+ * main reason is that page tables and directories should be readable
+ * by a domain, but it would not be safe for them to be writable.
+ * However, domains have free access to rings 1 & 2 of the Intel
+ * privilege model. In terms of page protection, these are considered
+ * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
+ * read-only restrictions are respected in supervisor mode -- if the
+ * bit is clear then any mapped page is writable.
+ *
+ * We get round this by always setting the WP bit and disallowing
+ * updates to it. This is very unlikely to cause a problem for guest
+ * OS's, which will generally use the WP bit to simplify copy-on-write
+ * implementation (in that case, OS wants a fault when it writes to
+ * an application-supplied buffer).
+ */
+
 #include <xen/event.h>
 #include <xen/guest_access.h>
 
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls
  2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
                     ` (10 preceding siblings ...)
  2017-07-30 15:43   ` [PATCH v3 extra 11/11] x86/mm: move description of x86 page table API to pv/mm.c Wei Liu
@ 2017-07-31  9:58   ` Jan Beulich
  11 siblings, 0 replies; 39+ messages in thread
From: Jan Beulich @ 2017-07-31  9:58 UTC (permalink / raw)
  To: wei.liu2; +Cc: george.dunlap, andrew.cooper3, xen-devel

>>> Wei Liu <wei.liu2@citrix.com> 07/30/17 5:43 PM >>>
>Note that in the stubs I choose to return EINVAL but maybe we should just BUG()
>there because those paths aren't supposed to be taken when !CONFIG_PV. And I'm
>sure common code will BUG_ON() or BUG() sooner or later. Thoughts?

BUG() - yes, please.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping
  2017-07-20 16:04 ` [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping Wei Liu
@ 2017-08-28 15:16   ` George Dunlap
  0 siblings, 0 replies; 39+ messages in thread
From: George Dunlap @ 2017-08-28 15:16 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

On 07/20/2017 05:04 PM, Wei Liu wrote:
> And at once make create_grant_host_mapping an inline function.  This

"At once" means "immediately" or "without any delay between this event
and the preceding event", which doesn't make sense here.  I think you
want, "At the same time."

Other than that, looks good:

Acked-by: George Dunlap <george.dunlap@citrix.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping
  2017-07-20 16:04 ` [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping Wei Liu
@ 2017-08-28 15:19   ` George Dunlap
  0 siblings, 0 replies; 39+ messages in thread
From: George Dunlap @ 2017-08-28 15:19 UTC (permalink / raw)
  To: Wei Liu, Xen-devel; +Cc: George Dunlap, Andrew Cooper, Jan Beulich

On 07/20/2017 05:04 PM, Wei Liu wrote:
> And at once make it an inline function. Add declarations of

at the same time

Other than that:

Acked-by: George Dunlap <george.dunlap@citrix.com>

> replace_grant_{hvm,pv}_mapping to respective header files.
> 
> The code movement will be done later.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/x86/mm.c                     |  9 +++------
>  xen/include/asm-x86/grant_table.h     | 10 ++++++++--
>  xen/include/asm-x86/hvm/grant_table.h |  8 ++++++++
>  xen/include/asm-x86/pv/grant_table.h  |  8 ++++++++
>  4 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
> index 532b1ee7e7..defc2c9bcc 100644
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -4296,7 +4296,7 @@ int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
>      return create_grant_va_mapping(addr, pte, current);
>  }
>  
> -static int replace_grant_p2m_mapping(
> +int replace_grant_p2m_mapping(
>      uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
>  {
>      unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
> @@ -4326,8 +4326,8 @@ static int replace_grant_p2m_mapping(
>      return GNTST_okay;
>  }
>  
> -int replace_grant_host_mapping(
> -    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
> +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
> +                             uint64_t new_addr, unsigned int flags)
>  {
>      struct vcpu *curr = current;
>      l1_pgentry_t *pl1e, ol1e;
> @@ -4335,9 +4335,6 @@ int replace_grant_host_mapping(
>      struct page_info *l1pg;
>      int rc;
>  
> -    if ( paging_mode_external(current->domain) )
> -        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
> -
>      if ( flags & GNTMAP_contains_pte )
>      {
>          if ( !new_addr )
> diff --git a/xen/include/asm-x86/grant_table.h b/xen/include/asm-x86/grant_table.h
> index 4aa22126d3..6c98672a4d 100644
> --- a/xen/include/asm-x86/grant_table.h
> +++ b/xen/include/asm-x86/grant_table.h
> @@ -27,8 +27,14 @@ static inline int create_grant_host_mapping(uint64_t addr, unsigned long frame,
>      return create_grant_pv_mapping(addr, frame, flags, cache_flags);
>  }
>  
> -int replace_grant_host_mapping(
> -    uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags);
> +static inline int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
> +                                             uint64_t new_addr,
> +                                             unsigned int flags)
> +{
> +    if ( paging_mode_external(current->domain) )
> +        return replace_grant_p2m_mapping(addr, frame, new_addr, flags);
> +    return replace_grant_pv_mapping(addr, frame, new_addr, flags);
> +}
>  
>  #define gnttab_create_shared_page(d, t, i)                               \
>      do {                                                                 \
> diff --git a/xen/include/asm-x86/hvm/grant_table.h b/xen/include/asm-x86/hvm/grant_table.h
> index 83202c219c..4b1afa179b 100644
> --- a/xen/include/asm-x86/hvm/grant_table.h
> +++ b/xen/include/asm-x86/hvm/grant_table.h
> @@ -26,6 +26,8 @@
>  int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
>                               unsigned int flags,
>                               unsigned int cache_flags);
> +int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
> +                              uint64_t new_addr, unsigned int flags);
>  
>  #else
>  
> @@ -38,6 +40,12 @@ static inline int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
>      return GNTST_general_error;
>  }
>  
> +int replace_grant_p2m_mapping(uint64_t addr, unsigned long frame,
> +                              uint64_t new_addr, unsigned int flags)
> +{
> +    return GNTST_general_error;
> +}
> +
>  #endif
>  
>  #endif /* __X86_HVM_GRANT_TABLE_H__ */
> diff --git a/xen/include/asm-x86/pv/grant_table.h b/xen/include/asm-x86/pv/grant_table.h
> index 165ebce22f..c6474973cd 100644
> --- a/xen/include/asm-x86/pv/grant_table.h
> +++ b/xen/include/asm-x86/pv/grant_table.h
> @@ -25,6 +25,8 @@
>  
>  int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
>                              unsigned int flags, unsigned int cache_flags);
> +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
> +                             uint64_t new_addr, unsigned int flags);
>  
>  #else
>  
> @@ -37,6 +39,12 @@ static inline int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
>      return GNTST_general_error;
>  }
>  
> +int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
> +                             uint64_t new_addr, unsigned int flags)
> +{
> +    return GNTST_general_error;
> +}
> +
>  #endif
>  
>  #endif /* __X86_PV_GRANT_TABLE_H__ */
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-08-28 15:20 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-20 16:04 [PATCH v3 00/21] x86: refactor mm.c (the easy part) Wei Liu
2017-07-20 16:04 ` [PATCH v3 01/21] x86/mm: carve out create_grant_pv_mapping Wei Liu
2017-08-28 15:16   ` George Dunlap
2017-07-20 16:04 ` [PATCH v3 02/21] x86/mm: carve out replace_grant_pv_mapping Wei Liu
2017-08-28 15:19   ` George Dunlap
2017-07-20 16:04 ` [PATCH v3 03/21] x86/mm: split HVM grant table code to hvm/grant_table.c Wei Liu
2017-07-20 16:04 ` [PATCH v3 04/21] x86/mm: lift PAGE_CACHE_ATTRS to page.h Wei Liu
2017-07-20 16:04 ` [PATCH v3 05/21] x86/mm: document the return values from get_page_from_l*e Wei Liu
2017-07-20 16:04 ` [PATCH v3 06/21] x86: move pv_emul_is_mem_write to pv/emulate.c Wei Liu
2017-07-20 16:04 ` [PATCH v3 07/21] x86/mm: move and rename guest_get_eff{, kern}_l1e Wei Liu
2017-07-20 16:04 ` [PATCH v3 08/21] x86/mm: export get_page_from_pagenr Wei Liu
2017-07-20 16:04 ` [PATCH v3 09/21] x86/mm: rename and move update_intpte Wei Liu
2017-07-20 16:04 ` [PATCH v3 10/21] x86/mm: move {un, }adjust_guest_* to pv/mm.h Wei Liu
2017-07-20 16:04 ` [PATCH v3 11/21] x86/mm: split out writable pagetable emulation code Wei Liu
2017-07-20 16:04 ` [PATCH v3 12/21] x86/mm: split out readonly MMIO " Wei Liu
2017-07-20 16:04 ` [PATCH v3 13/21] x86/mm: remove the unused inclusion of pv/emulate.h Wei Liu
2017-07-20 16:04 ` [PATCH v3 14/21] x86/mm: move and rename guest_{, un}map_l1e Wei Liu
2017-07-20 16:04 ` [PATCH v3 15/21] x86/mm: split out PV grant table code Wei Liu
2017-07-20 16:04 ` [PATCH v3 16/21] x86/mm: split out descriptor " Wei Liu
2017-07-20 16:04 ` [PATCH v3 17/21] x86/mm: move compat descriptor handling code Wei Liu
2017-07-20 16:04 ` [PATCH v3 18/21] x86/mm: move and rename map_ldt_shadow_page Wei Liu
2017-07-20 16:04 ` [PATCH v3 19/21] x86/mm: factor out pv_arch_init_memory Wei Liu
2017-07-20 16:04 ` [PATCH v3 20/21] x86/mm: move l4 table setup code Wei Liu
2017-07-20 16:04 ` [PATCH v3 21/21] x86/mm: add "pv_" prefix to new_guest_cr3 Wei Liu
2017-07-30  6:26 ` [PATCH v3 00/21] x86: refactor mm.c (the easy part) Jan Beulich
2017-07-30  9:23   ` Wei Liu
2017-07-30 15:43 ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 01/11] x86: add pv_ prefix to {alloc, free}_page_type Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 02/11] x86/mm: export more get/put page functions Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 03/11] x86/mm: move and add pv_ prefix to create_pae_xen_mappings Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 04/11] x86/mm: move disallow_mask variable and macros Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 05/11] x86/mm: move pv_{alloc, free}_page_type Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 06/11] x86/mm: move and add pv_ prefix to invalidate_shadow_ldt Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 07/11] x86/mm: move PV hypercalls to pv/mm-hypercalls.c Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 08/11] x86/mm: remove the now unused inclusion of pv/mm.h Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 09/11] x86/mm: use put_page_type_preemptible in put_page_from_l{2, 3}e Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 10/11] x86/mm: move {get, put}_page_from_l{2, 3, 4}e Wei Liu
2017-07-30 15:43   ` [PATCH v3 extra 11/11] x86/mm: move description of x86 page table API to pv/mm.c Wei Liu
2017-07-31  9:58   ` [PATCH v3 extra 00/11] x86: refactor mm.c: page APIs and hypercalls Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.