* [PATCH RFC 12/13] x86/mm: split PV MMU code to pv/mm.c
2017-03-27 9:10 [PATCH RFC 00/13] Refactor x86/mm.c Wei Liu
` (10 preceding siblings ...)
2017-03-27 9:10 ` [PATCH RFC 11/13] x86/mm: export create_pae_xen_mappings Wei Liu
@ 2017-03-27 9:10 ` Wei Liu
2017-03-27 9:10 ` [PATCH RFC 13/13] x86/mm: split HVM grant table code to hvm/mm.c Wei Liu
12 siblings, 0 replies; 30+ messages in thread
From: Wei Liu @ 2017-03-27 9:10 UTC (permalink / raw)
To: Xen-devel; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, Jan Beulich
Move the following PV specific code to that new file:
1. Three hypercalls that are only available to PV guests:
1. do_mmu_update
2. do_update_va_mapping
3. do_update_va_mapping_otherdomain
2. PV MMIO emulation code
3. PV writable page table emulation code
4. PV grant table creation / destruction code
5. Other supporting code for the above
Move everything in one patch because they share a lot of code.
Also move the PV page table API comment and delete trailing white
spaces.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
xen/arch/x86/mm.c | 1918 +---------------------------------------------
xen/arch/x86/pv/Makefile | 1 +
xen/arch/x86/pv/mm.c | 1902 +++++++++++++++++++++++++++++++++++++++++++++
xen/include/asm-x86/mm.h | 5 +
4 files changed, 1935 insertions(+), 1891 deletions(-)
create mode 100644 xen/arch/x86/pv/mm.c
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 92e79d7fb6..0119cacc43 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -18,71 +18,6 @@
* along with this program; If not, see <http://www.gnu.org/licenses/>.
*/
-/*
- * A description of the x86 page table API:
- *
- * Domains trap to do_mmu_update with a list of update requests.
- * This is a list of (ptr, val) pairs, where the requested operation
- * is *ptr = val.
- *
- * Reference counting of pages:
- * ----------------------------
- * Each page has two refcounts: tot_count and type_count.
- *
- * TOT_COUNT is the obvious reference count. It counts all uses of a
- * physical page frame by a domain, including uses as a page directory,
- * a page table, or simple mappings via a PTE. This count prevents a
- * domain from releasing a frame back to the free pool when it still holds
- * a reference to it.
- *
- * TYPE_COUNT is more subtle. A frame can be put to one of three
- * mutually-exclusive uses: it might be used as a page directory, or a
- * page table, or it may be mapped writable by the domain [of course, a
- * frame may not be used in any of these three ways!].
- * So, type_count is a count of the number of times a frame is being
- * referred to in its current incarnation. Therefore, a page can only
- * change its type when its type count is zero.
- *
- * Pinning the page type:
- * ----------------------
- * The type of a page can be pinned/unpinned with the commands
- * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
- * pinning is not reference counted, so it can't be nested).
- * This is useful to prevent a page's type count falling to zero, at which
- * point safety checks would need to be carried out next time the count
- * is increased again.
- *
- * A further note on writable page mappings:
- * -----------------------------------------
- * For simplicity, the count of writable mappings for a page may not
- * correspond to reality. The 'writable count' is incremented for every
- * PTE which maps the page with the _PAGE_RW flag set. However, for
- * write access to be possible the page directory entry must also have
- * its _PAGE_RW bit set. We do not check this as it complicates the
- * reference counting considerably [consider the case of multiple
- * directory entries referencing a single page table, some with the RW
- * bit set, others not -- it starts getting a bit messy].
- * In normal use, this simplification shouldn't be a problem.
- * However, the logic can be added if required.
- *
- * One more note on read-only page mappings:
- * -----------------------------------------
- * We want domains to be able to map pages for read-only access. The
- * main reason is that page tables and directories should be readable
- * by a domain, but it would not be safe for them to be writable.
- * However, domains have free access to rings 1 & 2 of the Intel
- * privilege model. In terms of page protection, these are considered
- * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
- * read-only restrictions are respected in supervisor mode -- if the
- * bit is clear then any mapped page is writable.
- *
- * We get round this by always setting the WP bit and disallowing
- * updates to it. This is very unlikely to cause a problem for guest
- * OS's, which will generally use the WP bit to simplify copy-on-write
- * implementation (in that case, OS wants a fault when it writes to
- * an application-supplied buffer).
- */
-
#include <xen/init.h>
#include <xen/kernel.h>
#include <xen/lib.h>
@@ -127,14 +62,6 @@
l1_pgentry_t __section(".bss.page_aligned") __aligned(PAGE_SIZE)
l1_fixmap[L1_PAGETABLE_ENTRIES];
-/*
- * PTE updates can be done with ordinary writes except:
- * 1. Debug builds get extra checking by using CMPXCHG[8B].
- */
-#if !defined(NDEBUG)
-#define PTE_UPDATE_WITH_CMPXCHG
-#endif
-
paddr_t __read_mostly mem_hotplug;
/* Private domain structs for DOMID_XEN and DOMID_IO. */
@@ -520,67 +447,6 @@ void update_cr3(struct vcpu *v)
make_cr3(v, cr3_mfn);
}
-/* Get a mapping of a PV guest's l1e for this virtual address. */
-static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
-{
- l2_pgentry_t l2e;
-
- ASSERT(!paging_mode_translate(current->domain));
- ASSERT(!paging_mode_external(current->domain));
-
- if ( unlikely(!__addr_ok(addr)) )
- return NULL;
-
- /* Find this l1e and its enclosing l1mfn in the linear map. */
- if ( __copy_from_user(&l2e,
- &__linear_l2_table[l2_linear_offset(addr)],
- sizeof(l2_pgentry_t)) )
- return NULL;
-
- /* Check flags that it will be safe to read the l1e. */
- if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
- return NULL;
-
- *gl1mfn = l2e_get_pfn(l2e);
-
- return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
- l1_table_offset(addr);
-}
-
-/* Pull down the mapping we got from guest_map_l1e(). */
-static inline void guest_unmap_l1e(void *p)
-{
- unmap_domain_page(p);
-}
-
-/* Read a PV guest's l1e that maps this virtual address. */
-static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
-{
- ASSERT(!paging_mode_translate(current->domain));
- ASSERT(!paging_mode_external(current->domain));
-
- if ( unlikely(!__addr_ok(addr)) ||
- __copy_from_user(eff_l1e,
- &__linear_l1_table[l1_linear_offset(addr)],
- sizeof(l1_pgentry_t)) )
- *eff_l1e = l1e_empty();
-}
-
-/*
- * Read the guest's l1e that maps this address, from the kernel-mode
- * page tables.
- */
-static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
- void *eff_l1e)
-{
- bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
-#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
-
- TOGGLE_MODE();
- guest_get_eff_l1e(addr, eff_l1e);
- TOGGLE_MODE();
-}
-
const char __section(".bss.page_aligned.const") __aligned(PAGE_SIZE)
zero_page[PAGE_SIZE];
@@ -635,49 +501,6 @@ static int alloc_segdesc_page(struct page_info *page)
return i == 512 ? 0 : -EINVAL;
}
-
-/* Map shadow page at offset @off. */
-int map_ldt_shadow_page(unsigned int off)
-{
- struct vcpu *v = current;
- struct domain *d = v->domain;
- unsigned long gmfn;
- struct page_info *page;
- l1_pgentry_t l1e, nl1e;
- unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
- int okay;
-
- BUG_ON(unlikely(in_irq()));
-
- if ( is_pv_32bit_domain(d) )
- gva = (u32)gva;
- guest_get_eff_kern_l1e(v, gva, &l1e);
- if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
- return 0;
-
- gmfn = l1e_get_pfn(l1e);
- page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
- if ( unlikely(!page) )
- return 0;
-
- okay = get_page_type(page, PGT_seg_desc_page);
- if ( unlikely(!okay) )
- {
- put_page(page);
- return 0;
- }
-
- nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
-
- spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
- l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
- v->arch.pv_vcpu.shadow_ldt_mapcnt++;
- spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
-
- return 1;
-}
-
-
int get_page_from_pagenr(unsigned long page_nr, struct domain *d)
{
struct page_info *page = mfn_to_page(page_nr);
@@ -1744,344 +1567,6 @@ void page_unlock(struct page_info *page)
} while ( (y = cmpxchg(&page->u.inuse.type_info, x, nx)) != x );
}
-/* How to write an entry to the guest pagetables.
- * Returns 0 for failure (pointer not valid), 1 for success. */
-static inline int update_intpte(intpte_t *p,
- intpte_t old,
- intpte_t new,
- unsigned long mfn,
- struct vcpu *v,
- int preserve_ad)
-{
- int rv = 1;
-#ifndef PTE_UPDATE_WITH_CMPXCHG
- if ( !preserve_ad )
- {
- rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
- }
- else
-#endif
- {
- intpte_t t = old;
- for ( ; ; )
- {
- intpte_t _new = new;
- if ( preserve_ad )
- _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
-
- rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
- if ( unlikely(rv == 0) )
- {
- MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte
- ": saw %" PRIpte, old, _new, t);
- break;
- }
-
- if ( t == old )
- break;
-
- /* Allowed to change in Accessed/Dirty flags only. */
- BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
-
- old = t;
- }
- }
- return rv;
-}
-
-/* Macro that wraps the appropriate type-changes around update_intpte().
- * Arguments are: type, ptr, old, new, mfn, vcpu */
-#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \
- update_intpte(&_t ## e_get_intpte(*(_p)), \
- _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \
- (_m), (_v), (_ad))
-
-/*
- * PTE flags that a guest may change without re-validating the PTE.
- * All other bits affect translation, caching, or Xen's safety.
- */
-#define FASTPATH_FLAG_WHITELIST \
- (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
- _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
-
-/* Update the L1 entry at pl1e to new value nl1e. */
-static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
- unsigned long gl1mfn, int preserve_ad,
- struct vcpu *pt_vcpu, struct domain *pg_dom)
-{
- l1_pgentry_t ol1e;
- struct domain *pt_dom = pt_vcpu->domain;
- int rc = 0;
-
- if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
- return -EFAULT;
-
- if ( unlikely(paging_mode_refcounts(pt_dom)) )
- {
- if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
- return 0;
- return -EBUSY;
- }
-
- if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
- {
- /* Translate foreign guest addresses. */
- struct page_info *page = NULL;
-
- if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
- {
- MEM_LOG("Bad L1 flags %x",
- l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
- return -EINVAL;
- }
-
- if ( paging_mode_translate(pg_dom) )
- {
- page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
- if ( !page )
- return -EINVAL;
- nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
- }
-
- /* Fast path for sufficiently-similar mappings. */
- if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
- {
- adjust_guest_l1e(nl1e, pt_dom);
- rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
- preserve_ad);
- if ( page )
- put_page(page);
- return rc ? 0 : -EBUSY;
- }
-
- switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
- {
- default:
- if ( page )
- put_page(page);
- return rc;
- case 0:
- break;
- case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
- ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
- l1e_flip_flags(nl1e, rc);
- rc = 0;
- break;
- }
- if ( page )
- put_page(page);
-
- adjust_guest_l1e(nl1e, pt_dom);
- if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
- preserve_ad)) )
- {
- ol1e = nl1e;
- rc = -EBUSY;
- }
- }
- else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
- preserve_ad)) )
- {
- return -EBUSY;
- }
-
- put_page_from_l1e(ol1e, pt_dom);
- return rc;
-}
-
-
-/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
-static int mod_l2_entry(l2_pgentry_t *pl2e,
- l2_pgentry_t nl2e,
- unsigned long pfn,
- int preserve_ad,
- struct vcpu *vcpu)
-{
- l2_pgentry_t ol2e;
- struct domain *d = vcpu->domain;
- struct page_info *l2pg = mfn_to_page(pfn);
- unsigned long type = l2pg->u.inuse.type_info;
- int rc = 0;
-
- if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
- {
- MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
- return -EPERM;
- }
-
- if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
- return -EFAULT;
-
- if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
- {
- if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
- {
- MEM_LOG("Bad L2 flags %x",
- l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
- return -EINVAL;
- }
-
- /* Fast path for sufficiently-similar mappings. */
- if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
- {
- adjust_guest_l2e(nl2e, d);
- if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
- return 0;
- return -EBUSY;
- }
-
- if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
- return rc;
-
- adjust_guest_l2e(nl2e, d);
- if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
- preserve_ad)) )
- {
- ol2e = nl2e;
- rc = -EBUSY;
- }
- }
- else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
- preserve_ad)) )
- {
- return -EBUSY;
- }
-
- put_page_from_l2e(ol2e, pfn);
- return rc;
-}
-
-/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
-static int mod_l3_entry(l3_pgentry_t *pl3e,
- l3_pgentry_t nl3e,
- unsigned long pfn,
- int preserve_ad,
- struct vcpu *vcpu)
-{
- l3_pgentry_t ol3e;
- struct domain *d = vcpu->domain;
- int rc = 0;
-
- if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
- {
- MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e);
- return -EINVAL;
- }
-
- /*
- * Disallow updates to final L3 slot. It contains Xen mappings, and it
- * would be a pain to ensure they remain continuously valid throughout.
- */
- if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
- return -EINVAL;
-
- if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
- return -EFAULT;
-
- if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
- {
- if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
- {
- MEM_LOG("Bad L3 flags %x",
- l3e_get_flags(nl3e) & l3_disallow_mask(d));
- return -EINVAL;
- }
-
- /* Fast path for sufficiently-similar mappings. */
- if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
- {
- adjust_guest_l3e(nl3e, d);
- rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
- return rc ? 0 : -EFAULT;
- }
-
- rc = get_page_from_l3e(nl3e, pfn, d, 0);
- if ( unlikely(rc < 0) )
- return rc;
- rc = 0;
-
- adjust_guest_l3e(nl3e, d);
- if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
- preserve_ad)) )
- {
- ol3e = nl3e;
- rc = -EFAULT;
- }
- }
- else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
- preserve_ad)) )
- {
- return -EFAULT;
- }
-
- if ( likely(rc == 0) )
- if ( !create_pae_xen_mappings(d, pl3e) )
- BUG();
-
- put_page_from_l3e(ol3e, pfn, 0, 1);
- return rc;
-}
-
-/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
-static int mod_l4_entry(l4_pgentry_t *pl4e,
- l4_pgentry_t nl4e,
- unsigned long pfn,
- int preserve_ad,
- struct vcpu *vcpu)
-{
- struct domain *d = vcpu->domain;
- l4_pgentry_t ol4e;
- int rc = 0;
-
- if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
- {
- MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e);
- return -EINVAL;
- }
-
- if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
- return -EFAULT;
-
- if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
- {
- if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
- {
- MEM_LOG("Bad L4 flags %x",
- l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
- return -EINVAL;
- }
-
- /* Fast path for sufficiently-similar mappings. */
- if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
- {
- adjust_guest_l4e(nl4e, d);
- rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
- return rc ? 0 : -EFAULT;
- }
-
- rc = get_page_from_l4e(nl4e, pfn, d, 0);
- if ( unlikely(rc < 0) )
- return rc;
- rc = 0;
-
- adjust_guest_l4e(nl4e, d);
- if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
- preserve_ad)) )
- {
- ol4e = nl4e;
- rc = -EFAULT;
- }
- }
- else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
- preserve_ad)) )
- {
- return -EFAULT;
- }
-
- put_page_from_l4e(ol4e, pfn, 0, 1);
- return rc;
-}
-
static int cleanup_page_cacheattr(struct page_info *page)
{
unsigned int cacheattr =
@@ -2849,125 +2334,23 @@ int vcpu_destroy_pagetables(struct vcpu *v)
return rc != -EINTR ? rc : -ERESTART;
}
-int new_guest_cr3(unsigned long mfn)
+struct domain *mm_get_pg_owner(domid_t domid)
{
- struct vcpu *curr = current;
- struct domain *d = curr->domain;
- int rc;
- unsigned long old_base_mfn;
+ struct domain *pg_owner = NULL, *curr = current->domain;
- if ( is_pv_32bit_domain(d) )
+ if ( likely(domid == DOMID_SELF) )
{
- unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
- l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
-
- rc = paging_mode_refcounts(d)
- ? -EINVAL /* Old code was broken, but what should it be? */
- : mod_l4_entry(
- pl4e,
- l4e_from_pfn(
- mfn,
- (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
- gt_mfn, 0, curr);
- unmap_domain_page(pl4e);
- switch ( rc )
- {
- case 0:
- break;
- case -EINTR:
- case -ERESTART:
- return -ERESTART;
- default:
- MEM_LOG("Error while installing new compat baseptr %lx", mfn);
- return rc;
- }
-
- invalidate_shadow_ldt(curr, 0);
- write_ptbase(curr);
-
- return 0;
+ pg_owner = rcu_lock_current_domain();
+ goto out;
}
- rc = put_old_guest_table(curr);
- if ( unlikely(rc) )
- return rc;
-
- old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
- /*
- * This is particularly important when getting restarted after the
- * previous attempt got preempted in the put-old-MFN phase.
- */
- if ( old_base_mfn == mfn )
+ if ( unlikely(domid == curr->domain_id) )
{
- write_ptbase(curr);
- return 0;
+ MEM_LOG("Cannot specify itself as foreign domain");
+ goto out;
}
- rc = paging_mode_refcounts(d)
- ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
- : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
- switch ( rc )
- {
- case 0:
- break;
- case -EINTR:
- case -ERESTART:
- return -ERESTART;
- default:
- MEM_LOG("Error while installing new baseptr %lx", mfn);
- return rc;
- }
-
- invalidate_shadow_ldt(curr, 0);
-
- if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
- fill_ro_mpt(mfn);
- curr->arch.guest_table = pagetable_from_pfn(mfn);
- update_cr3(curr);
-
- write_ptbase(curr);
-
- if ( likely(old_base_mfn != 0) )
- {
- struct page_info *page = mfn_to_page(old_base_mfn);
-
- if ( paging_mode_refcounts(d) )
- put_page(page);
- else
- switch ( rc = put_page_and_type_preemptible(page) )
- {
- case -EINTR:
- rc = -ERESTART;
- /* fallthrough */
- case -ERESTART:
- curr->arch.old_guest_table = page;
- break;
- default:
- BUG_ON(rc);
- break;
- }
- }
-
- return rc;
-}
-
-struct domain *mm_get_pg_owner(domid_t domid)
-{
- struct domain *pg_owner = NULL, *curr = current->domain;
-
- if ( likely(domid == DOMID_SELF) )
- {
- pg_owner = rcu_lock_current_domain();
- goto out;
- }
-
- if ( unlikely(domid == curr->domain_id) )
- {
- MEM_LOG("Cannot specify itself as foreign domain");
- goto out;
- }
-
- if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
+ if ( !is_hvm_domain(curr) && unlikely(paging_mode_translate(curr)) )
{
MEM_LOG("Cannot mix foreign mappings with translated domains");
goto out;
@@ -3581,572 +2964,6 @@ long do_mmuext_op(
return rc;
}
-long do_mmu_update(
- XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
- unsigned int count,
- XEN_GUEST_HANDLE_PARAM(uint) pdone,
- unsigned int foreigndom)
-{
- struct mmu_update req;
- void *va;
- unsigned long gpfn, gmfn, mfn;
- struct page_info *page;
- unsigned int cmd, i = 0, done = 0, pt_dom;
- struct vcpu *curr = current, *v = curr;
- struct domain *d = v->domain, *pt_owner = d, *pg_owner;
- struct domain_mmap_cache mapcache;
- uint32_t xsm_needed = 0;
- uint32_t xsm_checked = 0;
- int rc = put_old_guest_table(curr);
-
- if ( unlikely(rc) )
- {
- if ( likely(rc == -ERESTART) )
- rc = hypercall_create_continuation(
- __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
- foreigndom);
- return rc;
- }
-
- if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
- likely(guest_handle_is_null(ureqs)) )
- {
- /* See the curr->arch.old_guest_table related
- * hypercall_create_continuation() below. */
- return (int)foreigndom;
- }
-
- if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
- {
- count &= ~MMU_UPDATE_PREEMPTED;
- if ( unlikely(!guest_handle_is_null(pdone)) )
- (void)copy_from_guest(&done, pdone, 1);
- }
- else
- perfc_incr(calls_to_mmu_update);
-
- if ( unlikely(!guest_handle_okay(ureqs, count)) )
- return -EFAULT;
-
- if ( (pt_dom = foreigndom >> 16) != 0 )
- {
- /* Pagetables belong to a foreign domain (PFD). */
- if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
- return -ESRCH;
-
- if ( pt_owner == d )
- rcu_unlock_domain(pt_owner);
- else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
- {
- rc = -EINVAL;
- goto out;
- }
- }
-
- if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL )
- {
- rc = -ESRCH;
- goto out;
- }
-
- domain_mmap_cache_init(&mapcache);
-
- for ( i = 0; i < count; i++ )
- {
- if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
- {
- rc = -ERESTART;
- break;
- }
-
- if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
- {
- MEM_LOG("Bad __copy_from_guest");
- rc = -EFAULT;
- break;
- }
-
- cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
-
- switch ( cmd )
- {
- /*
- * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
- * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
- * current A/D bits.
- */
- case MMU_NORMAL_PT_UPDATE:
- case MMU_PT_UPDATE_PRESERVE_AD:
- {
- p2m_type_t p2mt;
-
- rc = -EOPNOTSUPP;
- if ( unlikely(paging_mode_refcounts(pt_owner)) )
- break;
-
- xsm_needed |= XSM_MMU_NORMAL_UPDATE;
- if ( get_pte_flags(req.val) & _PAGE_PRESENT )
- {
- xsm_needed |= XSM_MMU_UPDATE_READ;
- if ( get_pte_flags(req.val) & _PAGE_RW )
- xsm_needed |= XSM_MMU_UPDATE_WRITE;
- }
- if ( xsm_needed != xsm_checked )
- {
- rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
- if ( rc )
- break;
- xsm_checked = xsm_needed;
- }
- rc = -EINVAL;
-
- req.ptr -= cmd;
- gmfn = req.ptr >> PAGE_SHIFT;
- page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
-
- if ( p2m_is_paged(p2mt) )
- {
- ASSERT(!page);
- p2m_mem_paging_populate(pg_owner, gmfn);
- rc = -ENOENT;
- break;
- }
-
- if ( unlikely(!page) )
- {
- MEM_LOG("Could not get page for normal update");
- break;
- }
-
- mfn = page_to_mfn(page);
- va = map_domain_page_with_cache(mfn, &mapcache);
- va = (void *)((unsigned long)va +
- (unsigned long)(req.ptr & ~PAGE_MASK));
-
- if ( page_lock(page) )
- {
- switch ( page->u.inuse.type_info & PGT_type_mask )
- {
- case PGT_l1_page_table:
- {
- l1_pgentry_t l1e = l1e_from_intpte(req.val);
- p2m_type_t l1e_p2mt = p2m_ram_rw;
- struct page_info *target = NULL;
- p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
- P2M_UNSHARE : P2M_ALLOC;
-
- if ( paging_mode_translate(pg_owner) )
- target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
- &l1e_p2mt, q);
-
- if ( p2m_is_paged(l1e_p2mt) )
- {
- if ( target )
- put_page(target);
- p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
- rc = -ENOENT;
- break;
- }
- else if ( p2m_ram_paging_in == l1e_p2mt && !target )
- {
- rc = -ENOENT;
- break;
- }
- /* If we tried to unshare and failed */
- else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
- {
- /* We could not have obtained a page ref. */
- ASSERT(target == NULL);
- /* And mem_sharing_notify has already been called. */
- rc = -ENOMEM;
- break;
- }
-
- rc = mod_l1_entry(va, l1e, mfn,
- cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
- pg_owner);
- if ( target )
- put_page(target);
- }
- break;
- case PGT_l2_page_table:
- rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
- cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
- break;
- case PGT_l3_page_table:
- rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
- cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
- break;
- case PGT_l4_page_table:
- rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
- cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
- break;
- case PGT_writable_page:
- perfc_incr(writable_mmu_updates);
- if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
- rc = 0;
- break;
- }
- page_unlock(page);
- if ( rc == -EINTR )
- rc = -ERESTART;
- }
- else if ( get_page_type(page, PGT_writable_page) )
- {
- perfc_incr(writable_mmu_updates);
- if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
- rc = 0;
- put_page_type(page);
- }
-
- unmap_domain_page_with_cache(va, &mapcache);
- put_page(page);
- }
- break;
-
- case MMU_MACHPHYS_UPDATE:
- if ( unlikely(d != pt_owner) )
- {
- rc = -EPERM;
- break;
- }
-
- if ( unlikely(paging_mode_translate(pg_owner)) )
- {
- rc = -EINVAL;
- break;
- }
-
- mfn = req.ptr >> PAGE_SHIFT;
- gpfn = req.val;
-
- xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
- if ( xsm_needed != xsm_checked )
- {
- rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
- if ( rc )
- break;
- xsm_checked = xsm_needed;
- }
-
- if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
- {
- MEM_LOG("Could not get page for mach->phys update");
- rc = -EINVAL;
- break;
- }
-
- set_gpfn_from_mfn(mfn, gpfn);
-
- paging_mark_dirty(pg_owner, _mfn(mfn));
-
- put_page(mfn_to_page(mfn));
- break;
-
- default:
- MEM_LOG("Invalid page update command %x", cmd);
- rc = -ENOSYS;
- break;
- }
-
- if ( unlikely(rc) )
- break;
-
- guest_handle_add_offset(ureqs, 1);
- }
-
- if ( rc == -ERESTART )
- {
- ASSERT(i < count);
- rc = hypercall_create_continuation(
- __HYPERVISOR_mmu_update, "hihi",
- ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
- }
- else if ( curr->arch.old_guest_table )
- {
- XEN_GUEST_HANDLE_PARAM(void) null;
-
- ASSERT(rc || i == count);
- set_xen_guest_handle(null, NULL);
- /*
- * In order to have a way to communicate the final return value to
- * our continuation, we pass this in place of "foreigndom", building
- * on the fact that this argument isn't needed anymore.
- */
- rc = hypercall_create_continuation(
- __HYPERVISOR_mmu_update, "hihi", null,
- MMU_UPDATE_PREEMPTED, null, rc);
- }
-
- mm_put_pg_owner(pg_owner);
-
- domain_mmap_cache_destroy(&mapcache);
-
- perfc_add(num_page_updates, i);
-
- out:
- if ( pt_owner != d )
- rcu_unlock_domain(pt_owner);
-
- /* Add incremental work we have done to the @done output parameter. */
- if ( unlikely(!guest_handle_is_null(pdone)) )
- {
- done += i;
- copy_to_guest(pdone, &done, 1);
- }
-
- return rc;
-}
-
-
-static int create_grant_pte_mapping(
- uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
-{
- int rc = GNTST_okay;
- void *va;
- unsigned long gmfn, mfn;
- struct page_info *page;
- l1_pgentry_t ol1e;
- struct domain *d = v->domain;
-
- adjust_guest_l1e(nl1e, d);
-
- gmfn = pte_addr >> PAGE_SHIFT;
- page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
- if ( unlikely(!page) )
- {
- MEM_LOG("Could not get page for normal update");
- return GNTST_general_error;
- }
-
- mfn = page_to_mfn(page);
- va = map_domain_page(_mfn(mfn));
- va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
-
- if ( !page_lock(page) )
- {
- rc = GNTST_general_error;
- goto failed;
- }
-
- if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(page);
- rc = GNTST_general_error;
- goto failed;
- }
-
- ol1e = *(l1_pgentry_t *)va;
- if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
- {
- page_unlock(page);
- rc = GNTST_general_error;
- goto failed;
- }
-
- page_unlock(page);
-
- if ( !paging_mode_refcounts(d) )
- put_page_from_l1e(ol1e, d);
-
- failed:
- unmap_domain_page(va);
- put_page(page);
-
- return rc;
-}
-
-static int destroy_grant_pte_mapping(
- uint64_t addr, unsigned long frame, struct domain *d)
-{
- int rc = GNTST_okay;
- void *va;
- unsigned long gmfn, mfn;
- struct page_info *page;
- l1_pgentry_t ol1e;
-
- gmfn = addr >> PAGE_SHIFT;
- page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
-
- if ( unlikely(!page) )
- {
- MEM_LOG("Could not get page for normal update");
- return GNTST_general_error;
- }
-
- mfn = page_to_mfn(page);
- va = map_domain_page(_mfn(mfn));
- va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
-
- if ( !page_lock(page) )
- {
- rc = GNTST_general_error;
- goto failed;
- }
-
- if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(page);
- rc = GNTST_general_error;
- goto failed;
- }
-
- ol1e = *(l1_pgentry_t *)va;
-
- /* Check that the virtual address supplied is actually mapped to frame. */
- if ( unlikely(l1e_get_pfn(ol1e) != frame) )
- {
- page_unlock(page);
- MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx",
- (unsigned long)l1e_get_intpte(ol1e), addr, frame);
- rc = GNTST_general_error;
- goto failed;
- }
-
- /* Delete pagetable entry. */
- if ( unlikely(!UPDATE_ENTRY
- (l1,
- (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
- d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
- 0)) )
- {
- page_unlock(page);
- MEM_LOG("Cannot delete PTE entry at %p", va);
- rc = GNTST_general_error;
- goto failed;
- }
-
- page_unlock(page);
-
- failed:
- unmap_domain_page(va);
- put_page(page);
- return rc;
-}
-
-
-static int create_grant_va_mapping(
- unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
-{
- l1_pgentry_t *pl1e, ol1e;
- struct domain *d = v->domain;
- unsigned long gl1mfn;
- struct page_info *l1pg;
- int okay;
-
- adjust_guest_l1e(nl1e, d);
-
- pl1e = guest_map_l1e(va, &gl1mfn);
- if ( !pl1e )
- {
- MEM_LOG("Could not find L1 PTE for address %lx", va);
- return GNTST_general_error;
- }
-
- if ( !get_page_from_pagenr(gl1mfn, current->domain) )
- {
- guest_unmap_l1e(pl1e);
- return GNTST_general_error;
- }
-
- l1pg = mfn_to_page(gl1mfn);
- if ( !page_lock(l1pg) )
- {
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
- return GNTST_general_error;
- }
-
- if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(l1pg);
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
- return GNTST_general_error;
- }
-
- ol1e = *pl1e;
- okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
-
- page_unlock(l1pg);
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
-
- if ( okay && !paging_mode_refcounts(d) )
- put_page_from_l1e(ol1e, d);
-
- return okay ? GNTST_okay : GNTST_general_error;
-}
-
-static int replace_grant_va_mapping(
- unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
-{
- l1_pgentry_t *pl1e, ol1e;
- unsigned long gl1mfn;
- struct page_info *l1pg;
- int rc = 0;
-
- pl1e = guest_map_l1e(addr, &gl1mfn);
- if ( !pl1e )
- {
- MEM_LOG("Could not find L1 PTE for address %lx", addr);
- return GNTST_general_error;
- }
-
- if ( !get_page_from_pagenr(gl1mfn, current->domain) )
- {
- rc = GNTST_general_error;
- goto out;
- }
-
- l1pg = mfn_to_page(gl1mfn);
- if ( !page_lock(l1pg) )
- {
- rc = GNTST_general_error;
- put_page(l1pg);
- goto out;
- }
-
- if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- rc = GNTST_general_error;
- goto unlock_and_out;
- }
-
- ol1e = *pl1e;
-
- /* Check that the virtual address supplied is actually mapped to frame. */
- if ( unlikely(l1e_get_pfn(ol1e) != frame) )
- {
- MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx",
- l1e_get_pfn(ol1e), addr, frame);
- rc = GNTST_general_error;
- goto unlock_and_out;
- }
-
- /* Delete pagetable entry. */
- if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
- {
- MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
- rc = GNTST_general_error;
- goto unlock_and_out;
- }
-
- unlock_and_out:
- page_unlock(l1pg);
- put_page(l1pg);
- out:
- guest_unmap_l1e(pl1e);
- return rc;
-}
-
-static int destroy_grant_va_mapping(
- unsigned long addr, unsigned long frame, struct vcpu *v)
-{
- return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
-}
-
static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
unsigned int flags,
unsigned int cache_flags)
@@ -4170,140 +2987,38 @@ static int create_grant_p2m_mapping(uint64_t addr, unsigned long frame,
return GNTST_okay;
}
-static int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
- unsigned int flags, unsigned int cache_flags)
-{
- l1_pgentry_t pte;
- uint32_t grant_pte_flags;
-
- grant_pte_flags =
- _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
- if ( cpu_has_nx )
- grant_pte_flags |= _PAGE_NX_BIT;
-
- pte = l1e_from_pfn(frame, grant_pte_flags);
- if ( (flags & GNTMAP_application_map) )
- l1e_add_flags(pte,_PAGE_USER);
- if ( !(flags & GNTMAP_readonly) )
- l1e_add_flags(pte,_PAGE_RW);
-
- l1e_add_flags(pte,
- ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
- & _PAGE_AVAIL);
-
- l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
-
- if ( flags & GNTMAP_contains_pte )
- return create_grant_pte_mapping(addr, pte, current);
- return create_grant_va_mapping(addr, pte, current);
-}
-
int create_grant_host_mapping(uint64_t addr, unsigned long frame,
unsigned int flags, unsigned int cache_flags)
{
if ( paging_mode_external(current->domain) )
return create_grant_p2m_mapping(addr, frame, flags, cache_flags);
- return create_grant_pv_mapping(addr, frame, flags, cache_flags);
-}
-
-static int replace_grant_p2m_mapping(
- uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
-{
- unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
- p2m_type_t type;
- mfn_t old_mfn;
- struct domain *d = current->domain;
-
- if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
- return GNTST_general_error;
-
- old_mfn = get_gfn(d, gfn, &type);
- if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
- {
- put_gfn(d, gfn);
- MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
- type, mfn_x(old_mfn), frame);
- return GNTST_general_error;
- }
- guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
-
- put_gfn(d, gfn);
- return GNTST_okay;
-}
-
-static int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
- uint64_t new_addr, unsigned int flags)
-{
- struct vcpu *curr = current;
- l1_pgentry_t *pl1e, ol1e;
- unsigned long gl1mfn;
- struct page_info *l1pg;
- int rc;
-
- if ( flags & GNTMAP_contains_pte )
- {
- if ( !new_addr )
- return destroy_grant_pte_mapping(addr, frame, curr->domain);
-
- MEM_LOG("Unsupported grant table operation");
- return GNTST_general_error;
- }
-
- if ( !new_addr )
- return destroy_grant_va_mapping(addr, frame, curr);
-
- pl1e = guest_map_l1e(new_addr, &gl1mfn);
- if ( !pl1e )
- {
- MEM_LOG("Could not find L1 PTE for address %lx",
- (unsigned long)new_addr);
- return GNTST_general_error;
- }
-
- if ( !get_page_from_pagenr(gl1mfn, current->domain) )
- {
- guest_unmap_l1e(pl1e);
- return GNTST_general_error;
- }
+ return create_grant_pv_mapping(addr, frame, flags, cache_flags);
+}
- l1pg = mfn_to_page(gl1mfn);
- if ( !page_lock(l1pg) )
- {
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
- return GNTST_general_error;
- }
+static int replace_grant_p2m_mapping(
+ uint64_t addr, unsigned long frame, uint64_t new_addr, unsigned int flags)
+{
+ unsigned long gfn = (unsigned long)(addr >> PAGE_SHIFT);
+ p2m_type_t type;
+ mfn_t old_mfn;
+ struct domain *d = current->domain;
- if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(l1pg);
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
+ if ( new_addr != 0 || (flags & GNTMAP_contains_pte) )
return GNTST_general_error;
- }
-
- ol1e = *pl1e;
- if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
- gl1mfn, curr, 0)) )
+ old_mfn = get_gfn(d, gfn, &type);
+ if ( !p2m_is_grant(type) || mfn_x(old_mfn) != frame )
{
- page_unlock(l1pg);
- put_page(l1pg);
- MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
- guest_unmap_l1e(pl1e);
+ put_gfn(d, gfn);
+ MEM_LOG("replace_grant_p2m_mapping: old mapping invalid (type %d, mfn %lx, frame %lx)",
+ type, mfn_x(old_mfn), frame);
return GNTST_general_error;
}
+ guest_physmap_remove_page(d, _gfn(gfn), _mfn(frame), PAGE_ORDER_4K);
- page_unlock(l1pg);
- put_page(l1pg);
- guest_unmap_l1e(pl1e);
-
- rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
- if ( rc && !paging_mode_refcounts(curr->domain) )
- put_page_from_l1e(ol1e, curr->domain);
-
- return rc;
+ put_gfn(d, gfn);
+ return GNTST_okay;
}
int replace_grant_host_mapping(uint64_t addr, unsigned long frame,
@@ -4405,125 +3120,6 @@ int steal_page(
return -1;
}
-static int __do_update_va_mapping(
- unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
-{
- l1_pgentry_t val = l1e_from_intpte(val64);
- struct vcpu *v = current;
- struct domain *d = v->domain;
- struct page_info *gl1pg;
- l1_pgentry_t *pl1e;
- unsigned long bmap_ptr, gl1mfn;
- cpumask_t *mask = NULL;
- int rc;
-
- perfc_incr(calls_to_update_va);
-
- rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
- if ( rc )
- return rc;
-
- rc = -EINVAL;
- pl1e = guest_map_l1e(va, &gl1mfn);
- if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
- goto out;
-
- gl1pg = mfn_to_page(gl1mfn);
- if ( !page_lock(gl1pg) )
- {
- put_page(gl1pg);
- goto out;
- }
-
- if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(gl1pg);
- put_page(gl1pg);
- goto out;
- }
-
- rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
-
- page_unlock(gl1pg);
- put_page(gl1pg);
-
- out:
- if ( pl1e )
- guest_unmap_l1e(pl1e);
-
- switch ( flags & UVMF_FLUSHTYPE_MASK )
- {
- case UVMF_TLB_FLUSH:
- switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
- {
- case UVMF_LOCAL:
- flush_tlb_local();
- break;
- case UVMF_ALL:
- mask = d->domain_dirty_cpumask;
- break;
- default:
- mask = this_cpu(scratch_cpumask);
- rc = mm_vcpumask_to_pcpumask(d,
- const_guest_handle_from_ptr(bmap_ptr,
- void),
- mask);
- break;
- }
- if ( mask )
- flush_tlb_mask(mask);
- break;
-
- case UVMF_INVLPG:
- switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
- {
- case UVMF_LOCAL:
- paging_invlpg(v, va);
- break;
- case UVMF_ALL:
- mask = d->domain_dirty_cpumask;
- break;
- default:
- mask = this_cpu(scratch_cpumask);
- rc = mm_vcpumask_to_pcpumask(d,
- const_guest_handle_from_ptr(bmap_ptr,
- void),
- mask);
- break;
- }
- if ( mask )
- flush_tlb_one_mask(mask, va);
- break;
- }
-
- return rc;
-}
-
-long do_update_va_mapping(unsigned long va, u64 val64,
- unsigned long flags)
-{
- return __do_update_va_mapping(va, val64, flags, current->domain);
-}
-
-long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
- unsigned long flags,
- domid_t domid)
-{
- struct domain *pg_owner;
- int rc;
-
- if ( (pg_owner = mm_get_pg_owner(domid)) == NULL )
- return -ESRCH;
-
- rc = __do_update_va_mapping(va, val64, flags, pg_owner);
-
- mm_put_pg_owner(pg_owner);
-
- return rc;
-}
-
-
-
/*************************
* Descriptor Tables
*/
@@ -5084,466 +3680,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
return 0;
}
-
-/*************************
- * Writable Pagetables
- */
-
-struct ptwr_emulate_ctxt {
- struct x86_emulate_ctxt ctxt;
- unsigned long cr2;
- l1_pgentry_t pte;
-};
-
-static int ptwr_emulated_read(
- enum x86_segment seg,
- unsigned long offset,
- void *p_data,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
- unsigned int rc = bytes;
- unsigned long addr = offset;
-
- if ( !__addr_ok(addr) ||
- (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
- {
- x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */
- return X86EMUL_EXCEPTION;
- }
-
- return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_update(
- unsigned long addr,
- paddr_t old,
- paddr_t val,
- unsigned int bytes,
- unsigned int do_cmpxchg,
- struct ptwr_emulate_ctxt *ptwr_ctxt)
-{
- unsigned long mfn;
- unsigned long unaligned_addr = addr;
- struct page_info *page;
- l1_pgentry_t pte, ol1e, nl1e, *pl1e;
- struct vcpu *v = current;
- struct domain *d = v->domain;
- int ret;
-
- /* Only allow naturally-aligned stores within the original %cr2 page. */
- if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
- {
- MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
- ptwr_ctxt->cr2, addr, bytes);
- return X86EMUL_UNHANDLEABLE;
- }
-
- /* Turn a sub-word access into a full-word access. */
- if ( bytes != sizeof(paddr_t) )
- {
- paddr_t full;
- unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
-
- /* Align address; read full word. */
- addr &= ~(sizeof(paddr_t)-1);
- if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
- {
- x86_emul_pagefault(0, /* Read fault. */
- addr + sizeof(paddr_t) - rc,
- &ptwr_ctxt->ctxt);
- return X86EMUL_EXCEPTION;
- }
- /* Mask out bits provided by caller. */
- full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
- /* Shift the caller value and OR in the missing bits. */
- val &= (((paddr_t)1 << (bytes*8)) - 1);
- val <<= (offset)*8;
- val |= full;
- /* Also fill in missing parts of the cmpxchg old value. */
- old &= (((paddr_t)1 << (bytes*8)) - 1);
- old <<= (offset)*8;
- old |= full;
- }
-
- pte = ptwr_ctxt->pte;
- mfn = l1e_get_pfn(pte);
- page = mfn_to_page(mfn);
-
- /* We are looking only for read-only mappings of p.t. pages. */
- ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
- ASSERT(mfn_valid(_mfn(mfn)));
- ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
- ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
- ASSERT(page_get_owner(page) == d);
-
- /* Check the new PTE. */
- nl1e = l1e_from_intpte(val);
- switch ( ret = get_page_from_l1e(nl1e, d, d) )
- {
- default:
- if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
- !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
- {
- /*
- * If this is an upper-half write to a PAE PTE then we assume that
- * the guest has simply got the two writes the wrong way round. We
- * zap the PRESENT bit on the assumption that the bottom half will
- * be written immediately after we return to the guest.
- */
- gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
- PRIpte"\n", l1e_get_intpte(nl1e));
- l1e_remove_flags(nl1e, _PAGE_PRESENT);
- }
- else
- {
- MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
- return X86EMUL_UNHANDLEABLE;
- }
- break;
- case 0:
- break;
- case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
- ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
- l1e_flip_flags(nl1e, ret);
- break;
- }
-
- adjust_guest_l1e(nl1e, d);
-
- /* Checked successfully: do the update (write or cmpxchg). */
- pl1e = map_domain_page(_mfn(mfn));
- pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
- if ( do_cmpxchg )
- {
- int okay;
- intpte_t t = old;
- ol1e = l1e_from_intpte(old);
-
- okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
- &t, l1e_get_intpte(nl1e), _mfn(mfn));
- okay = (okay && t == old);
-
- if ( !okay )
- {
- unmap_domain_page(pl1e);
- put_page_from_l1e(nl1e, d);
- return X86EMUL_RETRY;
- }
- }
- else
- {
- ol1e = *pl1e;
- if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
- BUG();
- }
-
- trace_ptwr_emulation(addr, nl1e);
-
- unmap_domain_page(pl1e);
-
- /* Finally, drop the old PTE. */
- put_page_from_l1e(ol1e, d);
-
- return X86EMUL_OKAY;
-}
-
-static int ptwr_emulated_write(
- enum x86_segment seg,
- unsigned long offset,
- void *p_data,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
- paddr_t val = 0;
-
- if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
- {
- MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)",
- offset, bytes);
- return X86EMUL_UNHANDLEABLE;
- }
-
- memcpy(&val, p_data, bytes);
-
- return ptwr_emulated_update(
- offset, 0, val, bytes, 0,
- container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int ptwr_emulated_cmpxchg(
- enum x86_segment seg,
- unsigned long offset,
- void *p_old,
- void *p_new,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
- paddr_t old = 0, new = 0;
-
- if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
- {
- MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)",
- offset, bytes);
- return X86EMUL_UNHANDLEABLE;
- }
-
- memcpy(&old, p_old, bytes);
- memcpy(&new, p_new, bytes);
-
- return ptwr_emulated_update(
- offset, old, new, bytes, 1,
- container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
-}
-
-static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
- struct x86_emulate_ctxt *ctxt)
-{
- return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
- : X86EMUL_UNHANDLEABLE;
-}
-
-static const struct x86_emulate_ops ptwr_emulate_ops = {
- .read = ptwr_emulated_read,
- .insn_fetch = ptwr_emulated_read,
- .write = ptwr_emulated_write,
- .cmpxchg = ptwr_emulated_cmpxchg,
- .validate = pv_emul_is_mem_write,
- .cpuid = pv_emul_cpuid,
-};
-
-/* Write page fault handler: check if guest is trying to modify a PTE. */
-int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
- struct cpu_user_regs *regs)
-{
- struct domain *d = v->domain;
- struct page_info *page;
- l1_pgentry_t pte;
- struct ptwr_emulate_ctxt ptwr_ctxt = {
- .ctxt = {
- .regs = regs,
- .vendor = d->arch.cpuid->x86_vendor,
- .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
- .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
- .swint_emulate = x86_swint_emulate_none,
- },
- };
- int rc;
-
- /* Attempt to read the PTE that maps the VA being accessed. */
- guest_get_eff_l1e(addr, &pte);
-
- /* We are looking only for read-only mappings of p.t. pages. */
- if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
- rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
- !get_page_from_pagenr(l1e_get_pfn(pte), d) )
- goto bail;
-
- page = l1e_get_page(pte);
- if ( !page_lock(page) )
- {
- put_page(page);
- goto bail;
- }
-
- if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
- {
- page_unlock(page);
- put_page(page);
- goto bail;
- }
-
- ptwr_ctxt.cr2 = addr;
- ptwr_ctxt.pte = pte;
-
- rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
-
- page_unlock(page);
- put_page(page);
-
- switch ( rc )
- {
- case X86EMUL_EXCEPTION:
- /*
- * This emulation only covers writes to pagetables which are marked
- * read-only by Xen. We tolerate #PF (in case a concurrent pagetable
- * update has succeeded on a different vcpu). Anything else is an
- * emulation bug, or a guest playing with the instruction stream under
- * Xen's feet.
- */
- if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
- ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
- pv_inject_event(&ptwr_ctxt.ctxt.event);
- else
- gdprintk(XENLOG_WARNING,
- "Unexpected event (type %u, vector %#x) from emulation\n",
- ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
-
- /* Fallthrough */
- case X86EMUL_OKAY:
-
- if ( ptwr_ctxt.ctxt.retire.singlestep )
- pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
- /* Fallthrough */
- case X86EMUL_RETRY:
- perfc_incr(ptwr_emulations);
- return EXCRET_fault_fixed;
- }
-
- bail:
- return 0;
-}
-
-/*************************
- * fault handling for read-only MMIO pages
- */
-
-int mmio_ro_emulated_write(
- enum x86_segment seg,
- unsigned long offset,
- void *p_data,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
- struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
-
- /* Only allow naturally-aligned stores at the original %cr2 address. */
- if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
- offset != mmio_ro_ctxt->cr2 )
- {
- MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
- mmio_ro_ctxt->cr2, offset, bytes);
- return X86EMUL_UNHANDLEABLE;
- }
-
- return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmio_ro_emulate_ops = {
- .read = x86emul_unhandleable_rw,
- .insn_fetch = ptwr_emulated_read,
- .write = mmio_ro_emulated_write,
- .validate = pv_emul_is_mem_write,
- .cpuid = pv_emul_cpuid,
-};
-
-int mmcfg_intercept_write(
- enum x86_segment seg,
- unsigned long offset,
- void *p_data,
- unsigned int bytes,
- struct x86_emulate_ctxt *ctxt)
-{
- struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
-
- /*
- * Only allow naturally-aligned stores no wider than 4 bytes to the
- * original %cr2 address.
- */
- if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
- offset != mmio_ctxt->cr2 )
- {
- MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)",
- mmio_ctxt->cr2, offset, bytes);
- return X86EMUL_UNHANDLEABLE;
- }
-
- offset &= 0xfff;
- if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
- offset, bytes, p_data) >= 0 )
- pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
- PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
- *(uint32_t *)p_data);
-
- return X86EMUL_OKAY;
-}
-
-static const struct x86_emulate_ops mmcfg_intercept_ops = {
- .read = x86emul_unhandleable_rw,
- .insn_fetch = ptwr_emulated_read,
- .write = mmcfg_intercept_write,
- .validate = pv_emul_is_mem_write,
- .cpuid = pv_emul_cpuid,
-};
-
-/* Check if guest is trying to modify a r/o MMIO page. */
-int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
- struct cpu_user_regs *regs)
-{
- l1_pgentry_t pte;
- unsigned long mfn;
- unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
- struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
- struct x86_emulate_ctxt ctxt = {
- .regs = regs,
- .vendor = v->domain->arch.cpuid->x86_vendor,
- .addr_size = addr_size,
- .sp_size = addr_size,
- .swint_emulate = x86_swint_emulate_none,
- .data = &mmio_ro_ctxt
- };
- int rc;
-
- /* Attempt to read the PTE that maps the VA being accessed. */
- guest_get_eff_l1e(addr, &pte);
-
- /* We are looking only for read-only mappings of MMIO pages. */
- if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
- return 0;
-
- mfn = l1e_get_pfn(pte);
- if ( mfn_valid(_mfn(mfn)) )
- {
- struct page_info *page = mfn_to_page(mfn);
- struct domain *owner = page_get_owner_and_reference(page);
-
- if ( owner )
- put_page(page);
- if ( owner != dom_io )
- return 0;
- }
-
- if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
- return 0;
-
- if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
- rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
- else
- rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
-
- switch ( rc )
- {
- case X86EMUL_EXCEPTION:
- /*
- * This emulation only covers writes to MMCFG space or read-only MFNs.
- * We tolerate #PF (from hitting an adjacent page or a successful
- * concurrent pagetable update). Anything else is an emulation bug,
- * or a guest playing with the instruction stream under Xen's feet.
- */
- if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
- ctxt.event.vector == TRAP_page_fault )
- pv_inject_event(&ctxt.event);
- else
- gdprintk(XENLOG_WARNING,
- "Unexpected event (type %u, vector %#x) from emulation\n",
- ctxt.event.type, ctxt.event.vector);
-
- /* Fallthrough */
- case X86EMUL_OKAY:
-
- if ( ctxt.retire.singlestep )
- pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
-
- /* Fallthrough */
- case X86EMUL_RETRY:
- perfc_incr(ptwr_emulations);
- return EXCRET_fault_fixed;
- }
-
- return 0;
-}
-
void *alloc_xen_pagetable(void)
{
if ( system_state != SYS_STATE_early_boot )
diff --git a/xen/arch/x86/pv/Makefile b/xen/arch/x86/pv/Makefile
index ea94599438..665be5536c 100644
--- a/xen/arch/x86/pv/Makefile
+++ b/xen/arch/x86/pv/Makefile
@@ -1,2 +1,3 @@
obj-y += hypercall.o
obj-bin-y += dom0_build.init.o
+obj-y += mm.o
diff --git a/xen/arch/x86/pv/mm.c b/xen/arch/x86/pv/mm.c
new file mode 100644
index 0000000000..fd157b9f58
--- /dev/null
+++ b/xen/arch/x86/pv/mm.c
@@ -0,0 +1,1902 @@
+/******************************************************************************
+ * arch/x86/pv/mm.c
+ *
+ * Copyright (c) 2002-2005 K A Fraser
+ * Copyright (c) 2004 Christian Limpach
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * A description of the x86 page table API:
+ *
+ * Domains trap to do_mmu_update with a list of update requests.
+ * This is a list of (ptr, val) pairs, where the requested operation
+ * is *ptr = val.
+ *
+ * Reference counting of pages:
+ * ----------------------------
+ * Each page has two refcounts: tot_count and type_count.
+ *
+ * TOT_COUNT is the obvious reference count. It counts all uses of a
+ * physical page frame by a domain, including uses as a page directory,
+ * a page table, or simple mappings via a PTE. This count prevents a
+ * domain from releasing a frame back to the free pool when it still holds
+ * a reference to it.
+ *
+ * TYPE_COUNT is more subtle. A frame can be put to one of three
+ * mutually-exclusive uses: it might be used as a page directory, or a
+ * page table, or it may be mapped writable by the domain [of course, a
+ * frame may not be used in any of these three ways!].
+ * So, type_count is a count of the number of times a frame is being
+ * referred to in its current incarnation. Therefore, a page can only
+ * change its type when its type count is zero.
+ *
+ * Pinning the page type:
+ * ----------------------
+ * The type of a page can be pinned/unpinned with the commands
+ * MMUEXT_[UN]PIN_L?_TABLE. Each page can be pinned exactly once (that is,
+ * pinning is not reference counted, so it can't be nested).
+ * This is useful to prevent a page's type count falling to zero, at which
+ * point safety checks would need to be carried out next time the count
+ * is increased again.
+ *
+ * A further note on writable page mappings:
+ * -----------------------------------------
+ * For simplicity, the count of writable mappings for a page may not
+ * correspond to reality. The 'writable count' is incremented for every
+ * PTE which maps the page with the _PAGE_RW flag set. However, for
+ * write access to be possible the page directory entry must also have
+ * its _PAGE_RW bit set. We do not check this as it complicates the
+ * reference counting considerably [consider the case of multiple
+ * directory entries referencing a single page table, some with the RW
+ * bit set, others not -- it starts getting a bit messy].
+ * In normal use, this simplification shouldn't be a problem.
+ * However, the logic can be added if required.
+ *
+ * One more note on read-only page mappings:
+ * -----------------------------------------
+ * We want domains to be able to map pages for read-only access. The
+ * main reason is that page tables and directories should be readable
+ * by a domain, but it would not be safe for them to be writable.
+ * However, domains have free access to rings 1 & 2 of the Intel
+ * privilege model. In terms of page protection, these are considered
+ * to be part of 'supervisor mode'. The WP bit in CR0 controls whether
+ * read-only restrictions are respected in supervisor mode -- if the
+ * bit is clear then any mapped page is writable.
+ *
+ * We get round this by always setting the WP bit and disallowing
+ * updates to it. This is very unlikely to cause a problem for guest
+ * OS's, which will generally use the WP bit to simplify copy-on-write
+ * implementation (in that case, OS wants a fault when it writes to
+ * an application-supplied buffer).
+ */
+
+#include <xen/event.h>
+#include <xen/guest_access.h>
+#include <xen/hypercall.h>
+#include <xen/mm.h>
+#include <xen/sched.h>
+#include <xen/trace.h>
+#include <xsm/xsm.h>
+
+#include <asm/p2m.h>
+#include <asm/paging.h>
+#include <asm/x86_emulate.h>
+
+/*
+ * PTE updates can be done with ordinary writes except:
+ * 1. Debug builds get extra checking by using CMPXCHG[8B].
+ */
+#if !defined(NDEBUG)
+#define PTE_UPDATE_WITH_CMPXCHG
+#endif
+
+/* How to write an entry to the guest pagetables.
+ * Returns 0 for failure (pointer not valid), 1 for success. */
+static inline int update_intpte(intpte_t *p,
+ intpte_t old,
+ intpte_t new,
+ unsigned long mfn,
+ struct vcpu *v,
+ int preserve_ad)
+{
+ int rv = 1;
+#ifndef PTE_UPDATE_WITH_CMPXCHG
+ if ( !preserve_ad )
+ {
+ rv = paging_write_guest_entry(v, p, new, _mfn(mfn));
+ }
+ else
+#endif
+ {
+ intpte_t t = old;
+ for ( ; ; )
+ {
+ intpte_t _new = new;
+ if ( preserve_ad )
+ _new |= old & (_PAGE_ACCESSED | _PAGE_DIRTY);
+
+ rv = paging_cmpxchg_guest_entry(v, p, &t, _new, _mfn(mfn));
+ if ( unlikely(rv == 0) )
+ {
+ MEM_LOG("Failed to update %" PRIpte " -> %" PRIpte
+ ": saw %" PRIpte, old, _new, t);
+ break;
+ }
+
+ if ( t == old )
+ break;
+
+ /* Allowed to change in Accessed/Dirty flags only. */
+ BUG_ON((t ^ old) & ~(intpte_t)(_PAGE_ACCESSED|_PAGE_DIRTY));
+
+ old = t;
+ }
+ }
+ return rv;
+}
+
+/* Macro that wraps the appropriate type-changes around update_intpte().
+ * Arguments are: type, ptr, old, new, mfn, vcpu */
+#define UPDATE_ENTRY(_t,_p,_o,_n,_m,_v,_ad) \
+ update_intpte(&_t ## e_get_intpte(*(_p)), \
+ _t ## e_get_intpte(_o), _t ## e_get_intpte(_n), \
+ (_m), (_v), (_ad))
+
+/*
+ * PTE flags that a guest may change without re-validating the PTE.
+ * All other bits affect translation, caching, or Xen's safety.
+ */
+#define FASTPATH_FLAG_WHITELIST \
+ (_PAGE_NX_BIT | _PAGE_AVAIL_HIGH | _PAGE_AVAIL | _PAGE_GLOBAL | \
+ _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER)
+
+/* Update the L1 entry at pl1e to new value nl1e. */
+static int mod_l1_entry(l1_pgentry_t *pl1e, l1_pgentry_t nl1e,
+ unsigned long gl1mfn, int preserve_ad,
+ struct vcpu *pt_vcpu, struct domain *pg_dom)
+{
+ l1_pgentry_t ol1e;
+ struct domain *pt_dom = pt_vcpu->domain;
+ int rc = 0;
+
+ if ( unlikely(__copy_from_user(&ol1e, pl1e, sizeof(ol1e)) != 0) )
+ return -EFAULT;
+
+ if ( unlikely(paging_mode_refcounts(pt_dom)) )
+ {
+ if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu, preserve_ad) )
+ return 0;
+ return -EBUSY;
+ }
+
+ if ( l1e_get_flags(nl1e) & _PAGE_PRESENT )
+ {
+ /* Translate foreign guest addresses. */
+ struct page_info *page = NULL;
+
+ if ( unlikely(l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom)) )
+ {
+ MEM_LOG("Bad L1 flags %x",
+ l1e_get_flags(nl1e) & l1_disallow_mask(pt_dom));
+ return -EINVAL;
+ }
+
+ if ( paging_mode_translate(pg_dom) )
+ {
+ page = get_page_from_gfn(pg_dom, l1e_get_pfn(nl1e), NULL, P2M_ALLOC);
+ if ( !page )
+ return -EINVAL;
+ nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(nl1e));
+ }
+
+ /* Fast path for sufficiently-similar mappings. */
+ if ( !l1e_has_changed(ol1e, nl1e, ~FASTPATH_FLAG_WHITELIST) )
+ {
+ adjust_guest_l1e(nl1e, pt_dom);
+ rc = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+ preserve_ad);
+ if ( page )
+ put_page(page);
+ return rc ? 0 : -EBUSY;
+ }
+
+ switch ( rc = get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+ {
+ default:
+ if ( page )
+ put_page(page);
+ return rc;
+ case 0:
+ break;
+ case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+ ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+ l1e_flip_flags(nl1e, rc);
+ rc = 0;
+ break;
+ }
+ if ( page )
+ put_page(page);
+
+ adjust_guest_l1e(nl1e, pt_dom);
+ if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+ preserve_ad)) )
+ {
+ ol1e = nl1e;
+ rc = -EBUSY;
+ }
+ }
+ else if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
+ preserve_ad)) )
+ {
+ return -EBUSY;
+ }
+
+ put_page_from_l1e(ol1e, pt_dom);
+ return rc;
+}
+
+
+/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
+static int mod_l2_entry(l2_pgentry_t *pl2e,
+ l2_pgentry_t nl2e,
+ unsigned long pfn,
+ int preserve_ad,
+ struct vcpu *vcpu)
+{
+ l2_pgentry_t ol2e;
+ struct domain *d = vcpu->domain;
+ struct page_info *l2pg = mfn_to_page(pfn);
+ unsigned long type = l2pg->u.inuse.type_info;
+ int rc = 0;
+
+ if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
+ {
+ MEM_LOG("Illegal L2 update attempt in Xen-private area %p", pl2e);
+ return -EPERM;
+ }
+
+ if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
+ return -EFAULT;
+
+ if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
+ {
+ if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
+ {
+ MEM_LOG("Bad L2 flags %x",
+ l2e_get_flags(nl2e) & L2_DISALLOW_MASK);
+ return -EINVAL;
+ }
+
+ /* Fast path for sufficiently-similar mappings. */
+ if ( !l2e_has_changed(ol2e, nl2e, ~FASTPATH_FLAG_WHITELIST) )
+ {
+ adjust_guest_l2e(nl2e, d);
+ if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
+ return 0;
+ return -EBUSY;
+ }
+
+ if ( unlikely((rc = get_page_from_l2e(nl2e, pfn, d)) < 0) )
+ return rc;
+
+ adjust_guest_l2e(nl2e, d);
+ if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ ol2e = nl2e;
+ rc = -EBUSY;
+ }
+ }
+ else if ( unlikely(!UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ return -EBUSY;
+ }
+
+ put_page_from_l2e(ol2e, pfn);
+ return rc;
+}
+
+/* Update the L3 entry at pl3e to new value nl3e. pl3e is within frame pfn. */
+static int mod_l3_entry(l3_pgentry_t *pl3e,
+ l3_pgentry_t nl3e,
+ unsigned long pfn,
+ int preserve_ad,
+ struct vcpu *vcpu)
+{
+ l3_pgentry_t ol3e;
+ struct domain *d = vcpu->domain;
+ int rc = 0;
+
+ if ( unlikely(!is_guest_l3_slot(pgentry_ptr_to_slot(pl3e))) )
+ {
+ MEM_LOG("Illegal L3 update attempt in Xen-private area %p", pl3e);
+ return -EINVAL;
+ }
+
+ /*
+ * Disallow updates to final L3 slot. It contains Xen mappings, and it
+ * would be a pain to ensure they remain continuously valid throughout.
+ */
+ if ( is_pv_32bit_domain(d) && (pgentry_ptr_to_slot(pl3e) >= 3) )
+ return -EINVAL;
+
+ if ( unlikely(__copy_from_user(&ol3e, pl3e, sizeof(ol3e)) != 0) )
+ return -EFAULT;
+
+ if ( l3e_get_flags(nl3e) & _PAGE_PRESENT )
+ {
+ if ( unlikely(l3e_get_flags(nl3e) & l3_disallow_mask(d)) )
+ {
+ MEM_LOG("Bad L3 flags %x",
+ l3e_get_flags(nl3e) & l3_disallow_mask(d));
+ return -EINVAL;
+ }
+
+ /* Fast path for sufficiently-similar mappings. */
+ if ( !l3e_has_changed(ol3e, nl3e, ~FASTPATH_FLAG_WHITELIST) )
+ {
+ adjust_guest_l3e(nl3e, d);
+ rc = UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu, preserve_ad);
+ return rc ? 0 : -EFAULT;
+ }
+
+ rc = get_page_from_l3e(nl3e, pfn, d, 0);
+ if ( unlikely(rc < 0) )
+ return rc;
+ rc = 0;
+
+ adjust_guest_l3e(nl3e, d);
+ if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ ol3e = nl3e;
+ rc = -EFAULT;
+ }
+ }
+ else if ( unlikely(!UPDATE_ENTRY(l3, pl3e, ol3e, nl3e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ return -EFAULT;
+ }
+
+ if ( likely(rc == 0) )
+ if ( !create_pae_xen_mappings(d, pl3e) )
+ BUG();
+
+ put_page_from_l3e(ol3e, pfn, 0, 1);
+ return rc;
+}
+
+/* Update the L4 entry at pl4e to new value nl4e. pl4e is within frame pfn. */
+static int mod_l4_entry(l4_pgentry_t *pl4e,
+ l4_pgentry_t nl4e,
+ unsigned long pfn,
+ int preserve_ad,
+ struct vcpu *vcpu)
+{
+ struct domain *d = vcpu->domain;
+ l4_pgentry_t ol4e;
+ int rc = 0;
+
+ if ( unlikely(!is_guest_l4_slot(d, pgentry_ptr_to_slot(pl4e))) )
+ {
+ MEM_LOG("Illegal L4 update attempt in Xen-private area %p", pl4e);
+ return -EINVAL;
+ }
+
+ if ( unlikely(__copy_from_user(&ol4e, pl4e, sizeof(ol4e)) != 0) )
+ return -EFAULT;
+
+ if ( l4e_get_flags(nl4e) & _PAGE_PRESENT )
+ {
+ if ( unlikely(l4e_get_flags(nl4e) & L4_DISALLOW_MASK) )
+ {
+ MEM_LOG("Bad L4 flags %x",
+ l4e_get_flags(nl4e) & L4_DISALLOW_MASK);
+ return -EINVAL;
+ }
+
+ /* Fast path for sufficiently-similar mappings. */
+ if ( !l4e_has_changed(ol4e, nl4e, ~FASTPATH_FLAG_WHITELIST) )
+ {
+ adjust_guest_l4e(nl4e, d);
+ rc = UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu, preserve_ad);
+ return rc ? 0 : -EFAULT;
+ }
+
+ rc = get_page_from_l4e(nl4e, pfn, d, 0);
+ if ( unlikely(rc < 0) )
+ return rc;
+ rc = 0;
+
+ adjust_guest_l4e(nl4e, d);
+ if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ ol4e = nl4e;
+ rc = -EFAULT;
+ }
+ }
+ else if ( unlikely(!UPDATE_ENTRY(l4, pl4e, ol4e, nl4e, pfn, vcpu,
+ preserve_ad)) )
+ {
+ return -EFAULT;
+ }
+
+ put_page_from_l4e(ol4e, pfn, 0, 1);
+ return rc;
+}
+
+/* Get a mapping of a PV guest's l1e for this virtual address. */
+static l1_pgentry_t *guest_map_l1e(unsigned long addr, unsigned long *gl1mfn)
+{
+ l2_pgentry_t l2e;
+
+ ASSERT(!paging_mode_translate(current->domain));
+ ASSERT(!paging_mode_external(current->domain));
+
+ if ( unlikely(!__addr_ok(addr)) )
+ return NULL;
+
+ /* Find this l1e and its enclosing l1mfn in the linear map. */
+ if ( __copy_from_user(&l2e,
+ &__linear_l2_table[l2_linear_offset(addr)],
+ sizeof(l2_pgentry_t)) )
+ return NULL;
+
+ /* Check flags that it will be safe to read the l1e. */
+ if ( (l2e_get_flags(l2e) & (_PAGE_PRESENT | _PAGE_PSE)) != _PAGE_PRESENT )
+ return NULL;
+
+ *gl1mfn = l2e_get_pfn(l2e);
+
+ return (l1_pgentry_t *)map_domain_page(_mfn(*gl1mfn)) +
+ l1_table_offset(addr);
+}
+
+/* Pull down the mapping we got from guest_map_l1e(). */
+static inline void guest_unmap_l1e(void *p)
+{
+ unmap_domain_page(p);
+}
+
+static int create_grant_va_mapping(
+ unsigned long va, l1_pgentry_t nl1e, struct vcpu *v)
+{
+ l1_pgentry_t *pl1e, ol1e;
+ struct domain *d = v->domain;
+ unsigned long gl1mfn;
+ struct page_info *l1pg;
+ int okay;
+
+ adjust_guest_l1e(nl1e, d);
+
+ pl1e = guest_map_l1e(va, &gl1mfn);
+ if ( !pl1e )
+ {
+ MEM_LOG("Could not find L1 PTE for address %lx", va);
+ return GNTST_general_error;
+ }
+
+ if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+ {
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ l1pg = mfn_to_page(gl1mfn);
+ if ( !page_lock(l1pg) )
+ {
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(l1pg);
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ ol1e = *pl1e;
+ okay = UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0);
+
+ page_unlock(l1pg);
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+
+ if ( okay && !paging_mode_refcounts(d) )
+ put_page_from_l1e(ol1e, d);
+
+ return okay ? GNTST_okay : GNTST_general_error;
+}
+
+static int replace_grant_va_mapping(
+ unsigned long addr, unsigned long frame, l1_pgentry_t nl1e, struct vcpu *v)
+{
+ l1_pgentry_t *pl1e, ol1e;
+ unsigned long gl1mfn;
+ struct page_info *l1pg;
+ int rc = 0;
+
+ pl1e = guest_map_l1e(addr, &gl1mfn);
+ if ( !pl1e )
+ {
+ MEM_LOG("Could not find L1 PTE for address %lx", addr);
+ return GNTST_general_error;
+ }
+
+ if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+ {
+ rc = GNTST_general_error;
+ goto out;
+ }
+
+ l1pg = mfn_to_page(gl1mfn);
+ if ( !page_lock(l1pg) )
+ {
+ rc = GNTST_general_error;
+ put_page(l1pg);
+ goto out;
+ }
+
+ if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ rc = GNTST_general_error;
+ goto unlock_and_out;
+ }
+
+ ol1e = *pl1e;
+
+ /* Check that the virtual address supplied is actually mapped to frame. */
+ if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+ {
+ MEM_LOG("PTE entry %lx for address %lx doesn't match frame %lx",
+ l1e_get_pfn(ol1e), addr, frame);
+ rc = GNTST_general_error;
+ goto unlock_and_out;
+ }
+
+ /* Delete pagetable entry. */
+ if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, v, 0)) )
+ {
+ MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
+ rc = GNTST_general_error;
+ goto unlock_and_out;
+ }
+
+ unlock_and_out:
+ page_unlock(l1pg);
+ put_page(l1pg);
+ out:
+ guest_unmap_l1e(pl1e);
+ return rc;
+}
+
+static int create_grant_pte_mapping(
+ uint64_t pte_addr, l1_pgentry_t nl1e, struct vcpu *v)
+{
+ int rc = GNTST_okay;
+ void *va;
+ unsigned long gmfn, mfn;
+ struct page_info *page;
+ l1_pgentry_t ol1e;
+ struct domain *d = v->domain;
+
+ adjust_guest_l1e(nl1e, d);
+
+ gmfn = pte_addr >> PAGE_SHIFT;
+ page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+ if ( unlikely(!page) )
+ {
+ MEM_LOG("Could not get page for normal update");
+ return GNTST_general_error;
+ }
+
+ mfn = page_to_mfn(page);
+ va = map_domain_page(_mfn(mfn));
+ va = (void *)((unsigned long)va + ((unsigned long)pte_addr & ~PAGE_MASK));
+
+ if ( !page_lock(page) )
+ {
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(page);
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ ol1e = *(l1_pgentry_t *)va;
+ if ( !UPDATE_ENTRY(l1, (l1_pgentry_t *)va, ol1e, nl1e, mfn, v, 0) )
+ {
+ page_unlock(page);
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ page_unlock(page);
+
+ if ( !paging_mode_refcounts(d) )
+ put_page_from_l1e(ol1e, d);
+
+ failed:
+ unmap_domain_page(va);
+ put_page(page);
+
+ return rc;
+}
+
+static int destroy_grant_pte_mapping(
+ uint64_t addr, unsigned long frame, struct domain *d)
+{
+ int rc = GNTST_okay;
+ void *va;
+ unsigned long gmfn, mfn;
+ struct page_info *page;
+ l1_pgentry_t ol1e;
+
+ gmfn = addr >> PAGE_SHIFT;
+ page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+
+ if ( unlikely(!page) )
+ {
+ MEM_LOG("Could not get page for normal update");
+ return GNTST_general_error;
+ }
+
+ mfn = page_to_mfn(page);
+ va = map_domain_page(_mfn(mfn));
+ va = (void *)((unsigned long)va + ((unsigned long)addr & ~PAGE_MASK));
+
+ if ( !page_lock(page) )
+ {
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(page);
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ ol1e = *(l1_pgentry_t *)va;
+
+ /* Check that the virtual address supplied is actually mapped to frame. */
+ if ( unlikely(l1e_get_pfn(ol1e) != frame) )
+ {
+ page_unlock(page);
+ MEM_LOG("PTE entry %lx for address %"PRIx64" doesn't match frame %lx",
+ (unsigned long)l1e_get_intpte(ol1e), addr, frame);
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ /* Delete pagetable entry. */
+ if ( unlikely(!UPDATE_ENTRY
+ (l1,
+ (l1_pgentry_t *)va, ol1e, l1e_empty(), mfn,
+ d->vcpu[0] /* Change if we go to per-vcpu shadows. */,
+ 0)) )
+ {
+ page_unlock(page);
+ MEM_LOG("Cannot delete PTE entry at %p", va);
+ rc = GNTST_general_error;
+ goto failed;
+ }
+
+ page_unlock(page);
+
+ failed:
+ unmap_domain_page(va);
+ put_page(page);
+ return rc;
+}
+
+/* Read a PV guest's l1e that maps this virtual address. */
+static inline void guest_get_eff_l1e(unsigned long addr, l1_pgentry_t *eff_l1e)
+{
+ ASSERT(!paging_mode_translate(current->domain));
+ ASSERT(!paging_mode_external(current->domain));
+
+ if ( unlikely(!__addr_ok(addr)) ||
+ __copy_from_user(eff_l1e,
+ &__linear_l1_table[l1_linear_offset(addr)],
+ sizeof(l1_pgentry_t)) )
+ *eff_l1e = l1e_empty();
+}
+
+/*
+ * Read the guest's l1e that maps this address, from the kernel-mode
+ * page tables.
+ */
+static inline void guest_get_eff_kern_l1e(struct vcpu *v, unsigned long addr,
+ void *eff_l1e)
+{
+ bool_t user_mode = !(v->arch.flags & TF_kernel_mode);
+#define TOGGLE_MODE() if ( user_mode ) toggle_guest_mode(v)
+
+ TOGGLE_MODE();
+ guest_get_eff_l1e(addr, eff_l1e);
+ TOGGLE_MODE();
+}
+
+static int destroy_grant_va_mapping(
+ unsigned long addr, unsigned long frame, struct vcpu *v)
+{
+ return replace_grant_va_mapping(addr, frame, l1e_empty(), v);
+}
+
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+ uint64_t new_addr, unsigned int flags)
+{
+ struct vcpu *curr = current;
+ l1_pgentry_t *pl1e, ol1e;
+ unsigned long gl1mfn;
+ struct page_info *l1pg;
+ int rc;
+
+ if ( flags & GNTMAP_contains_pte )
+ {
+ if ( !new_addr )
+ return destroy_grant_pte_mapping(addr, frame, curr->domain);
+
+ MEM_LOG("Unsupported grant table operation");
+ return GNTST_general_error;
+ }
+
+ if ( !new_addr )
+ return destroy_grant_va_mapping(addr, frame, curr);
+
+ pl1e = guest_map_l1e(new_addr, &gl1mfn);
+ if ( !pl1e )
+ {
+ MEM_LOG("Could not find L1 PTE for address %lx",
+ (unsigned long)new_addr);
+ return GNTST_general_error;
+ }
+
+ if ( !get_page_from_pagenr(gl1mfn, current->domain) )
+ {
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ l1pg = mfn_to_page(gl1mfn);
+ if ( !page_lock(l1pg) )
+ {
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ if ( (l1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(l1pg);
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ ol1e = *pl1e;
+
+ if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, l1e_empty(),
+ gl1mfn, curr, 0)) )
+ {
+ page_unlock(l1pg);
+ put_page(l1pg);
+ MEM_LOG("Cannot delete PTE entry at %p", (unsigned long *)pl1e);
+ guest_unmap_l1e(pl1e);
+ return GNTST_general_error;
+ }
+
+ page_unlock(l1pg);
+ put_page(l1pg);
+ guest_unmap_l1e(pl1e);
+
+ rc = replace_grant_va_mapping(addr, frame, ol1e, curr);
+ if ( rc && !paging_mode_refcounts(curr->domain) )
+ put_page_from_l1e(ol1e, curr->domain);
+
+ return rc;
+}
+
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+ unsigned int flags, unsigned int cache_flags)
+{
+ l1_pgentry_t pte;
+ uint32_t grant_pte_flags;
+
+ grant_pte_flags =
+ _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_GNTTAB;
+ if ( cpu_has_nx )
+ grant_pte_flags |= _PAGE_NX_BIT;
+
+ pte = l1e_from_pfn(frame, grant_pte_flags);
+ if ( (flags & GNTMAP_application_map) )
+ l1e_add_flags(pte,_PAGE_USER);
+ if ( !(flags & GNTMAP_readonly) )
+ l1e_add_flags(pte,_PAGE_RW);
+
+ l1e_add_flags(pte,
+ ((flags >> _GNTMAP_guest_avail0) * _PAGE_AVAIL0)
+ & _PAGE_AVAIL);
+
+ l1e_add_flags(pte, cacheattr_to_pte_flags(cache_flags >> 5));
+
+ if ( flags & GNTMAP_contains_pte )
+ return create_grant_pte_mapping(addr, pte, current);
+ return create_grant_va_mapping(addr, pte, current);
+}
+
+/* Map shadow page at offset @off. */
+int map_ldt_shadow_page(unsigned int off)
+{
+ struct vcpu *v = current;
+ struct domain *d = v->domain;
+ unsigned long gmfn;
+ struct page_info *page;
+ l1_pgentry_t l1e, nl1e;
+ unsigned long gva = v->arch.pv_vcpu.ldt_base + (off << PAGE_SHIFT);
+ int okay;
+
+ BUG_ON(unlikely(in_irq()));
+
+ if ( is_pv_32bit_domain(d) )
+ gva = (u32)gva;
+ guest_get_eff_kern_l1e(v, gva, &l1e);
+ if ( unlikely(!(l1e_get_flags(l1e) & _PAGE_PRESENT)) )
+ return 0;
+
+ gmfn = l1e_get_pfn(l1e);
+ page = get_page_from_gfn(d, gmfn, NULL, P2M_ALLOC);
+ if ( unlikely(!page) )
+ return 0;
+
+ okay = get_page_type(page, PGT_seg_desc_page);
+ if ( unlikely(!okay) )
+ {
+ put_page(page);
+ return 0;
+ }
+
+ nl1e = l1e_from_pfn(page_to_mfn(page), l1e_get_flags(l1e) | _PAGE_RW);
+
+ spin_lock(&v->arch.pv_vcpu.shadow_ldt_lock);
+ l1e_write(&gdt_ldt_ptes(d, v)[off + 16], nl1e);
+ v->arch.pv_vcpu.shadow_ldt_mapcnt++;
+ spin_unlock(&v->arch.pv_vcpu.shadow_ldt_lock);
+
+ return 1;
+}
+
+int new_guest_cr3(unsigned long mfn)
+{
+ struct vcpu *curr = current;
+ struct domain *d = curr->domain;
+ int rc;
+ unsigned long old_base_mfn;
+
+ if ( is_pv_32bit_domain(d) )
+ {
+ unsigned long gt_mfn = pagetable_get_pfn(curr->arch.guest_table);
+ l4_pgentry_t *pl4e = map_domain_page(_mfn(gt_mfn));
+
+ rc = paging_mode_refcounts(d)
+ ? -EINVAL /* Old code was broken, but what should it be? */
+ : mod_l4_entry(
+ pl4e,
+ l4e_from_pfn(
+ mfn,
+ (_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|_PAGE_ACCESSED)),
+ gt_mfn, 0, curr);
+ unmap_domain_page(pl4e);
+ switch ( rc )
+ {
+ case 0:
+ break;
+ case -EINTR:
+ case -ERESTART:
+ return -ERESTART;
+ default:
+ MEM_LOG("Error while installing new compat baseptr %lx", mfn);
+ return rc;
+ }
+
+ invalidate_shadow_ldt(curr, 0);
+ write_ptbase(curr);
+
+ return 0;
+ }
+
+ rc = put_old_guest_table(curr);
+ if ( unlikely(rc) )
+ return rc;
+
+ old_base_mfn = pagetable_get_pfn(curr->arch.guest_table);
+ /*
+ * This is particularly important when getting restarted after the
+ * previous attempt got preempted in the put-old-MFN phase.
+ */
+ if ( old_base_mfn == mfn )
+ {
+ write_ptbase(curr);
+ return 0;
+ }
+
+ rc = paging_mode_refcounts(d)
+ ? (get_page_from_pagenr(mfn, d) ? 0 : -EINVAL)
+ : get_page_and_type_from_pagenr(mfn, PGT_root_page_table, d, 0, 1);
+ switch ( rc )
+ {
+ case 0:
+ break;
+ case -EINTR:
+ case -ERESTART:
+ return -ERESTART;
+ default:
+ MEM_LOG("Error while installing new baseptr %lx", mfn);
+ return rc;
+ }
+
+ invalidate_shadow_ldt(curr, 0);
+
+ if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
+ fill_ro_mpt(mfn);
+ curr->arch.guest_table = pagetable_from_pfn(mfn);
+ update_cr3(curr);
+
+ write_ptbase(curr);
+
+ if ( likely(old_base_mfn != 0) )
+ {
+ struct page_info *page = mfn_to_page(old_base_mfn);
+
+ if ( paging_mode_refcounts(d) )
+ put_page(page);
+ else
+ switch ( rc = put_page_and_type_preemptible(page) )
+ {
+ case -EINTR:
+ rc = -ERESTART;
+ /* fallthrough */
+ case -ERESTART:
+ curr->arch.old_guest_table = page;
+ break;
+ default:
+ BUG_ON(rc);
+ break;
+ }
+ }
+
+ return rc;
+}
+
+/*************************
+ * Writable Pagetables
+ */
+
+struct ptwr_emulate_ctxt {
+ struct x86_emulate_ctxt ctxt;
+ unsigned long cr2;
+ l1_pgentry_t pte;
+};
+
+static int ptwr_emulated_read(
+ enum x86_segment seg,
+ unsigned long offset,
+ void *p_data,
+ unsigned int bytes,
+ struct x86_emulate_ctxt *ctxt)
+{
+ unsigned int rc = bytes;
+ unsigned long addr = offset;
+
+ if ( !__addr_ok(addr) ||
+ (rc = __copy_from_user(p_data, (void *)addr, bytes)) )
+ {
+ x86_emul_pagefault(0, addr + bytes - rc, ctxt); /* Read fault. */
+ return X86EMUL_EXCEPTION;
+ }
+
+ return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_update(
+ unsigned long addr,
+ paddr_t old,
+ paddr_t val,
+ unsigned int bytes,
+ unsigned int do_cmpxchg,
+ struct ptwr_emulate_ctxt *ptwr_ctxt)
+{
+ unsigned long mfn;
+ unsigned long unaligned_addr = addr;
+ struct page_info *page;
+ l1_pgentry_t pte, ol1e, nl1e, *pl1e;
+ struct vcpu *v = current;
+ struct domain *d = v->domain;
+ int ret;
+
+ /* Only allow naturally-aligned stores within the original %cr2 page. */
+ if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
+ {
+ MEM_LOG("ptwr_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
+ ptwr_ctxt->cr2, addr, bytes);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ /* Turn a sub-word access into a full-word access. */
+ if ( bytes != sizeof(paddr_t) )
+ {
+ paddr_t full;
+ unsigned int rc, offset = addr & (sizeof(paddr_t)-1);
+
+ /* Align address; read full word. */
+ addr &= ~(sizeof(paddr_t)-1);
+ if ( (rc = copy_from_user(&full, (void *)addr, sizeof(paddr_t))) != 0 )
+ {
+ x86_emul_pagefault(0, /* Read fault. */
+ addr + sizeof(paddr_t) - rc,
+ &ptwr_ctxt->ctxt);
+ return X86EMUL_EXCEPTION;
+ }
+ /* Mask out bits provided by caller. */
+ full &= ~((((paddr_t)1 << (bytes*8)) - 1) << (offset*8));
+ /* Shift the caller value and OR in the missing bits. */
+ val &= (((paddr_t)1 << (bytes*8)) - 1);
+ val <<= (offset)*8;
+ val |= full;
+ /* Also fill in missing parts of the cmpxchg old value. */
+ old &= (((paddr_t)1 << (bytes*8)) - 1);
+ old <<= (offset)*8;
+ old |= full;
+ }
+
+ pte = ptwr_ctxt->pte;
+ mfn = l1e_get_pfn(pte);
+ page = mfn_to_page(mfn);
+
+ /* We are looking only for read-only mappings of p.t. pages. */
+ ASSERT((l1e_get_flags(pte) & (_PAGE_RW|_PAGE_PRESENT)) == _PAGE_PRESENT);
+ ASSERT(mfn_valid(_mfn(mfn)));
+ ASSERT((page->u.inuse.type_info & PGT_type_mask) == PGT_l1_page_table);
+ ASSERT((page->u.inuse.type_info & PGT_count_mask) != 0);
+ ASSERT(page_get_owner(page) == d);
+
+ /* Check the new PTE. */
+ nl1e = l1e_from_intpte(val);
+ switch ( ret = get_page_from_l1e(nl1e, d, d) )
+ {
+ default:
+ if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
+ !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
+ {
+ /*
+ * If this is an upper-half write to a PAE PTE then we assume that
+ * the guest has simply got the two writes the wrong way round. We
+ * zap the PRESENT bit on the assumption that the bottom half will
+ * be written immediately after we return to the guest.
+ */
+ gdprintk(XENLOG_DEBUG, "ptwr_emulate: fixing up invalid PAE PTE %"
+ PRIpte"\n", l1e_get_intpte(nl1e));
+ l1e_remove_flags(nl1e, _PAGE_PRESENT);
+ }
+ else
+ {
+ MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
+ return X86EMUL_UNHANDLEABLE;
+ }
+ break;
+ case 0:
+ break;
+ case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+ ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+ l1e_flip_flags(nl1e, ret);
+ break;
+ }
+
+ adjust_guest_l1e(nl1e, d);
+
+ /* Checked successfully: do the update (write or cmpxchg). */
+ pl1e = map_domain_page(_mfn(mfn));
+ pl1e = (l1_pgentry_t *)((unsigned long)pl1e + (addr & ~PAGE_MASK));
+ if ( do_cmpxchg )
+ {
+ int okay;
+ intpte_t t = old;
+ ol1e = l1e_from_intpte(old);
+
+ okay = paging_cmpxchg_guest_entry(v, &l1e_get_intpte(*pl1e),
+ &t, l1e_get_intpte(nl1e), _mfn(mfn));
+ okay = (okay && t == old);
+
+ if ( !okay )
+ {
+ unmap_domain_page(pl1e);
+ put_page_from_l1e(nl1e, d);
+ return X86EMUL_RETRY;
+ }
+ }
+ else
+ {
+ ol1e = *pl1e;
+ if ( !UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, mfn, v, 0) )
+ BUG();
+ }
+
+ trace_ptwr_emulation(addr, nl1e);
+
+ unmap_domain_page(pl1e);
+
+ /* Finally, drop the old PTE. */
+ put_page_from_l1e(ol1e, d);
+
+ return X86EMUL_OKAY;
+}
+
+static int ptwr_emulated_write(
+ enum x86_segment seg,
+ unsigned long offset,
+ void *p_data,
+ unsigned int bytes,
+ struct x86_emulate_ctxt *ctxt)
+{
+ paddr_t val = 0;
+
+ if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes - 1)) || !bytes )
+ {
+ MEM_LOG("ptwr_emulate: bad write size (addr=%lx, bytes=%u)",
+ offset, bytes);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ memcpy(&val, p_data, bytes);
+
+ return ptwr_emulated_update(
+ offset, 0, val, bytes, 0,
+ container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int ptwr_emulated_cmpxchg(
+ enum x86_segment seg,
+ unsigned long offset,
+ void *p_old,
+ void *p_new,
+ unsigned int bytes,
+ struct x86_emulate_ctxt *ctxt)
+{
+ paddr_t old = 0, new = 0;
+
+ if ( (bytes > sizeof(paddr_t)) || (bytes & (bytes -1)) )
+ {
+ MEM_LOG("ptwr_emulate: bad cmpxchg size (addr=%lx, bytes=%u)",
+ offset, bytes);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ memcpy(&old, p_old, bytes);
+ memcpy(&new, p_new, bytes);
+
+ return ptwr_emulated_update(
+ offset, old, new, bytes, 1,
+ container_of(ctxt, struct ptwr_emulate_ctxt, ctxt));
+}
+
+static int pv_emul_is_mem_write(const struct x86_emulate_state *state,
+ struct x86_emulate_ctxt *ctxt)
+{
+ return x86_insn_is_mem_write(state, ctxt) ? X86EMUL_OKAY
+ : X86EMUL_UNHANDLEABLE;
+}
+
+static const struct x86_emulate_ops ptwr_emulate_ops = {
+ .read = ptwr_emulated_read,
+ .insn_fetch = ptwr_emulated_read,
+ .write = ptwr_emulated_write,
+ .cmpxchg = ptwr_emulated_cmpxchg,
+ .validate = pv_emul_is_mem_write,
+ .cpuid = pv_emul_cpuid,
+};
+
+/* Write page fault handler: check if guest is trying to modify a PTE. */
+int ptwr_do_page_fault(struct vcpu *v, unsigned long addr,
+ struct cpu_user_regs *regs)
+{
+ struct domain *d = v->domain;
+ struct page_info *page;
+ l1_pgentry_t pte;
+ struct ptwr_emulate_ctxt ptwr_ctxt = {
+ .ctxt = {
+ .regs = regs,
+ .vendor = d->arch.cpuid->x86_vendor,
+ .addr_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+ .sp_size = is_pv_32bit_domain(d) ? 32 : BITS_PER_LONG,
+ .swint_emulate = x86_swint_emulate_none,
+ },
+ };
+ int rc;
+
+ /* Attempt to read the PTE that maps the VA being accessed. */
+ guest_get_eff_l1e(addr, &pte);
+
+ /* We are looking only for read-only mappings of p.t. pages. */
+ if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) ||
+ rangeset_contains_singleton(mmio_ro_ranges, l1e_get_pfn(pte)) ||
+ !get_page_from_pagenr(l1e_get_pfn(pte), d) )
+ goto bail;
+
+ page = l1e_get_page(pte);
+ if ( !page_lock(page) )
+ {
+ put_page(page);
+ goto bail;
+ }
+
+ if ( (page->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(page);
+ put_page(page);
+ goto bail;
+ }
+
+ ptwr_ctxt.cr2 = addr;
+ ptwr_ctxt.pte = pte;
+
+ rc = x86_emulate(&ptwr_ctxt.ctxt, &ptwr_emulate_ops);
+
+ page_unlock(page);
+ put_page(page);
+
+ switch ( rc )
+ {
+ case X86EMUL_EXCEPTION:
+ /*
+ * This emulation only covers writes to pagetables which are marked
+ * read-only by Xen. We tolerate #PF (in case a concurrent pagetable
+ * update has succeeded on a different vcpu). Anything else is an
+ * emulation bug, or a guest playing with the instruction stream under
+ * Xen's feet.
+ */
+ if ( ptwr_ctxt.ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+ ptwr_ctxt.ctxt.event.vector == TRAP_page_fault )
+ pv_inject_event(&ptwr_ctxt.ctxt.event);
+ else
+ gdprintk(XENLOG_WARNING,
+ "Unexpected event (type %u, vector %#x) from emulation\n",
+ ptwr_ctxt.ctxt.event.type, ptwr_ctxt.ctxt.event.vector);
+
+ /* Fallthrough */
+ case X86EMUL_OKAY:
+
+ if ( ptwr_ctxt.ctxt.retire.singlestep )
+ pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+ /* Fallthrough */
+ case X86EMUL_RETRY:
+ perfc_incr(ptwr_emulations);
+ return EXCRET_fault_fixed;
+ }
+
+ bail:
+ return 0;
+}
+
+
+/*************************
+ * fault handling for read-only MMIO pages
+ */
+
+int mmio_ro_emulated_write(
+ enum x86_segment seg,
+ unsigned long offset,
+ void *p_data,
+ unsigned int bytes,
+ struct x86_emulate_ctxt *ctxt)
+{
+ struct mmio_ro_emulate_ctxt *mmio_ro_ctxt = ctxt->data;
+
+ /* Only allow naturally-aligned stores at the original %cr2 address. */
+ if ( ((bytes | offset) & (bytes - 1)) || !bytes ||
+ offset != mmio_ro_ctxt->cr2 )
+ {
+ MEM_LOG("mmio_ro_emulate: bad access (cr2=%lx, addr=%lx, bytes=%u)",
+ mmio_ro_ctxt->cr2, offset, bytes);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmio_ro_emulate_ops = {
+ .read = x86emul_unhandleable_rw,
+ .insn_fetch = ptwr_emulated_read,
+ .write = mmio_ro_emulated_write,
+ .validate = pv_emul_is_mem_write,
+ .cpuid = pv_emul_cpuid,
+};
+
+int mmcfg_intercept_write(
+ enum x86_segment seg,
+ unsigned long offset,
+ void *p_data,
+ unsigned int bytes,
+ struct x86_emulate_ctxt *ctxt)
+{
+ struct mmio_ro_emulate_ctxt *mmio_ctxt = ctxt->data;
+
+ /*
+ * Only allow naturally-aligned stores no wider than 4 bytes to the
+ * original %cr2 address.
+ */
+ if ( ((bytes | offset) & (bytes - 1)) || bytes > 4 || !bytes ||
+ offset != mmio_ctxt->cr2 )
+ {
+ MEM_LOG("mmcfg_intercept: bad write (cr2=%lx, addr=%lx, bytes=%u)",
+ mmio_ctxt->cr2, offset, bytes);
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ offset &= 0xfff;
+ if ( pci_conf_write_intercept(mmio_ctxt->seg, mmio_ctxt->bdf,
+ offset, bytes, p_data) >= 0 )
+ pci_mmcfg_write(mmio_ctxt->seg, PCI_BUS(mmio_ctxt->bdf),
+ PCI_DEVFN2(mmio_ctxt->bdf), offset, bytes,
+ *(uint32_t *)p_data);
+
+ return X86EMUL_OKAY;
+}
+
+static const struct x86_emulate_ops mmcfg_intercept_ops = {
+ .read = x86emul_unhandleable_rw,
+ .insn_fetch = ptwr_emulated_read,
+ .write = mmcfg_intercept_write,
+ .validate = pv_emul_is_mem_write,
+ .cpuid = pv_emul_cpuid,
+};
+
+/* Check if guest is trying to modify a r/o MMIO page. */
+int mmio_ro_do_page_fault(struct vcpu *v, unsigned long addr,
+ struct cpu_user_regs *regs)
+{
+ l1_pgentry_t pte;
+ unsigned long mfn;
+ unsigned int addr_size = is_pv_32bit_vcpu(v) ? 32 : BITS_PER_LONG;
+ struct mmio_ro_emulate_ctxt mmio_ro_ctxt = { .cr2 = addr };
+ struct x86_emulate_ctxt ctxt = {
+ .regs = regs,
+ .vendor = v->domain->arch.cpuid->x86_vendor,
+ .addr_size = addr_size,
+ .sp_size = addr_size,
+ .swint_emulate = x86_swint_emulate_none,
+ .data = &mmio_ro_ctxt
+ };
+ int rc;
+
+ /* Attempt to read the PTE that maps the VA being accessed. */
+ guest_get_eff_l1e(addr, &pte);
+
+ /* We are looking only for read-only mappings of MMIO pages. */
+ if ( ((l1e_get_flags(pte) & (_PAGE_PRESENT|_PAGE_RW)) != _PAGE_PRESENT) )
+ return 0;
+
+ mfn = l1e_get_pfn(pte);
+ if ( mfn_valid(_mfn(mfn)) )
+ {
+ struct page_info *page = mfn_to_page(mfn);
+ struct domain *owner = page_get_owner_and_reference(page);
+
+ if ( owner )
+ put_page(page);
+ if ( owner != dom_io )
+ return 0;
+ }
+
+ if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+ return 0;
+
+ if ( pci_ro_mmcfg_decode(mfn, &mmio_ro_ctxt.seg, &mmio_ro_ctxt.bdf) )
+ rc = x86_emulate(&ctxt, &mmcfg_intercept_ops);
+ else
+ rc = x86_emulate(&ctxt, &mmio_ro_emulate_ops);
+
+ switch ( rc )
+ {
+ case X86EMUL_EXCEPTION:
+ /*
+ * This emulation only covers writes to MMCFG space or read-only MFNs.
+ * We tolerate #PF (from hitting an adjacent page or a successful
+ * concurrent pagetable update). Anything else is an emulation bug,
+ * or a guest playing with the instruction stream under Xen's feet.
+ */
+ if ( ctxt.event.type == X86_EVENTTYPE_HW_EXCEPTION &&
+ ctxt.event.vector == TRAP_page_fault )
+ pv_inject_event(&ctxt.event);
+ else
+ gdprintk(XENLOG_WARNING,
+ "Unexpected event (type %u, vector %#x) from emulation\n",
+ ctxt.event.type, ctxt.event.vector);
+
+ /* Fallthrough */
+ case X86EMUL_OKAY:
+
+ if ( ctxt.retire.singlestep )
+ pv_inject_hw_exception(TRAP_debug, X86_EVENT_NO_EC);
+
+ /* Fallthrough */
+ case X86EMUL_RETRY:
+ perfc_incr(ptwr_emulations);
+ return EXCRET_fault_fixed;
+ }
+
+ return 0;
+}
+
+/*************************
+ * PV MMU hypercalls
+ */
+long do_mmu_update(
+ XEN_GUEST_HANDLE_PARAM(mmu_update_t) ureqs,
+ unsigned int count,
+ XEN_GUEST_HANDLE_PARAM(uint) pdone,
+ unsigned int foreigndom)
+{
+ struct mmu_update req;
+ void *va;
+ unsigned long gpfn, gmfn, mfn;
+ struct page_info *page;
+ unsigned int cmd, i = 0, done = 0, pt_dom;
+ struct vcpu *curr = current, *v = curr;
+ struct domain *d = v->domain, *pt_owner = d, *pg_owner;
+ struct domain_mmap_cache mapcache;
+ uint32_t xsm_needed = 0;
+ uint32_t xsm_checked = 0;
+ int rc = put_old_guest_table(curr);
+
+ if ( unlikely(rc) )
+ {
+ if ( likely(rc == -ERESTART) )
+ rc = hypercall_create_continuation(
+ __HYPERVISOR_mmu_update, "hihi", ureqs, count, pdone,
+ foreigndom);
+ return rc;
+ }
+
+ if ( unlikely(count == MMU_UPDATE_PREEMPTED) &&
+ likely(guest_handle_is_null(ureqs)) )
+ {
+ /* See the curr->arch.old_guest_table related
+ * hypercall_create_continuation() below. */
+ return (int)foreigndom;
+ }
+
+ if ( unlikely(count & MMU_UPDATE_PREEMPTED) )
+ {
+ count &= ~MMU_UPDATE_PREEMPTED;
+ if ( unlikely(!guest_handle_is_null(pdone)) )
+ (void)copy_from_guest(&done, pdone, 1);
+ }
+ else
+ perfc_incr(calls_to_mmu_update);
+
+ if ( unlikely(!guest_handle_okay(ureqs, count)) )
+ return -EFAULT;
+
+ if ( (pt_dom = foreigndom >> 16) != 0 )
+ {
+ /* Pagetables belong to a foreign domain (PFD). */
+ if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
+ return -ESRCH;
+
+ if ( pt_owner == d )
+ rcu_unlock_domain(pt_owner);
+ else if ( !pt_owner->vcpu || (v = pt_owner->vcpu[0]) == NULL )
+ {
+ rc = -EINVAL;
+ goto out;
+ }
+ }
+
+ if ( (pg_owner = mm_get_pg_owner((uint16_t)foreigndom)) == NULL )
+ {
+ rc = -ESRCH;
+ goto out;
+ }
+
+ domain_mmap_cache_init(&mapcache);
+
+ for ( i = 0; i < count; i++ )
+ {
+ if ( curr->arch.old_guest_table || (i && hypercall_preempt_check()) )
+ {
+ rc = -ERESTART;
+ break;
+ }
+
+ if ( unlikely(__copy_from_guest(&req, ureqs, 1) != 0) )
+ {
+ MEM_LOG("Bad __copy_from_guest");
+ rc = -EFAULT;
+ break;
+ }
+
+ cmd = req.ptr & (sizeof(l1_pgentry_t)-1);
+
+ switch ( cmd )
+ {
+ /*
+ * MMU_NORMAL_PT_UPDATE: Normal update to any level of page table.
+ * MMU_UPDATE_PT_PRESERVE_AD: As above but also preserve (OR)
+ * current A/D bits.
+ */
+ case MMU_NORMAL_PT_UPDATE:
+ case MMU_PT_UPDATE_PRESERVE_AD:
+ {
+ p2m_type_t p2mt;
+
+ rc = -EOPNOTSUPP;
+ if ( unlikely(paging_mode_refcounts(pt_owner)) )
+ break;
+
+ xsm_needed |= XSM_MMU_NORMAL_UPDATE;
+ if ( get_pte_flags(req.val) & _PAGE_PRESENT )
+ {
+ xsm_needed |= XSM_MMU_UPDATE_READ;
+ if ( get_pte_flags(req.val) & _PAGE_RW )
+ xsm_needed |= XSM_MMU_UPDATE_WRITE;
+ }
+ if ( xsm_needed != xsm_checked )
+ {
+ rc = xsm_mmu_update(XSM_TARGET, d, pt_owner, pg_owner, xsm_needed);
+ if ( rc )
+ break;
+ xsm_checked = xsm_needed;
+ }
+ rc = -EINVAL;
+
+ req.ptr -= cmd;
+ gmfn = req.ptr >> PAGE_SHIFT;
+ page = get_page_from_gfn(pt_owner, gmfn, &p2mt, P2M_ALLOC);
+
+ if ( p2m_is_paged(p2mt) )
+ {
+ ASSERT(!page);
+ p2m_mem_paging_populate(pg_owner, gmfn);
+ rc = -ENOENT;
+ break;
+ }
+
+ if ( unlikely(!page) )
+ {
+ MEM_LOG("Could not get page for normal update");
+ break;
+ }
+
+ mfn = page_to_mfn(page);
+ va = map_domain_page_with_cache(mfn, &mapcache);
+ va = (void *)((unsigned long)va +
+ (unsigned long)(req.ptr & ~PAGE_MASK));
+
+ if ( page_lock(page) )
+ {
+ switch ( page->u.inuse.type_info & PGT_type_mask )
+ {
+ case PGT_l1_page_table:
+ {
+ l1_pgentry_t l1e = l1e_from_intpte(req.val);
+ p2m_type_t l1e_p2mt = p2m_ram_rw;
+ struct page_info *target = NULL;
+ p2m_query_t q = (l1e_get_flags(l1e) & _PAGE_RW) ?
+ P2M_UNSHARE : P2M_ALLOC;
+
+ if ( paging_mode_translate(pg_owner) )
+ target = get_page_from_gfn(pg_owner, l1e_get_pfn(l1e),
+ &l1e_p2mt, q);
+
+ if ( p2m_is_paged(l1e_p2mt) )
+ {
+ if ( target )
+ put_page(target);
+ p2m_mem_paging_populate(pg_owner, l1e_get_pfn(l1e));
+ rc = -ENOENT;
+ break;
+ }
+ else if ( p2m_ram_paging_in == l1e_p2mt && !target )
+ {
+ rc = -ENOENT;
+ break;
+ }
+ /* If we tried to unshare and failed */
+ else if ( (q & P2M_UNSHARE) && p2m_is_shared(l1e_p2mt) )
+ {
+ /* We could not have obtained a page ref. */
+ ASSERT(target == NULL);
+ /* And mem_sharing_notify has already been called. */
+ rc = -ENOMEM;
+ break;
+ }
+
+ rc = mod_l1_entry(va, l1e, mfn,
+ cmd == MMU_PT_UPDATE_PRESERVE_AD, v,
+ pg_owner);
+ if ( target )
+ put_page(target);
+ }
+ break;
+ case PGT_l2_page_table:
+ rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
+ cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+ break;
+ case PGT_l3_page_table:
+ rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
+ cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+ break;
+ case PGT_l4_page_table:
+ rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
+ cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+ break;
+ case PGT_writable_page:
+ perfc_incr(writable_mmu_updates);
+ if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+ rc = 0;
+ break;
+ }
+ page_unlock(page);
+ if ( rc == -EINTR )
+ rc = -ERESTART;
+ }
+ else if ( get_page_type(page, PGT_writable_page) )
+ {
+ perfc_incr(writable_mmu_updates);
+ if ( paging_write_guest_entry(v, va, req.val, _mfn(mfn)) )
+ rc = 0;
+ put_page_type(page);
+ }
+
+ unmap_domain_page_with_cache(va, &mapcache);
+ put_page(page);
+ }
+ break;
+
+ case MMU_MACHPHYS_UPDATE:
+ if ( unlikely(d != pt_owner) )
+ {
+ rc = -EPERM;
+ break;
+ }
+
+ if ( unlikely(paging_mode_translate(pg_owner)) )
+ {
+ rc = -EINVAL;
+ break;
+ }
+
+ mfn = req.ptr >> PAGE_SHIFT;
+ gpfn = req.val;
+
+ xsm_needed |= XSM_MMU_MACHPHYS_UPDATE;
+ if ( xsm_needed != xsm_checked )
+ {
+ rc = xsm_mmu_update(XSM_TARGET, d, NULL, pg_owner, xsm_needed);
+ if ( rc )
+ break;
+ xsm_checked = xsm_needed;
+ }
+
+ if ( unlikely(!get_page_from_pagenr(mfn, pg_owner)) )
+ {
+ MEM_LOG("Could not get page for mach->phys update");
+ rc = -EINVAL;
+ break;
+ }
+
+ set_gpfn_from_mfn(mfn, gpfn);
+
+ paging_mark_dirty(pg_owner, _mfn(mfn));
+
+ put_page(mfn_to_page(mfn));
+ break;
+
+ default:
+ MEM_LOG("Invalid page update command %x", cmd);
+ rc = -ENOSYS;
+ break;
+ }
+
+ if ( unlikely(rc) )
+ break;
+
+ guest_handle_add_offset(ureqs, 1);
+ }
+
+ if ( rc == -ERESTART )
+ {
+ ASSERT(i < count);
+ rc = hypercall_create_continuation(
+ __HYPERVISOR_mmu_update, "hihi",
+ ureqs, (count - i) | MMU_UPDATE_PREEMPTED, pdone, foreigndom);
+ }
+ else if ( curr->arch.old_guest_table )
+ {
+ XEN_GUEST_HANDLE_PARAM(void) null;
+
+ ASSERT(rc || i == count);
+ set_xen_guest_handle(null, NULL);
+ /*
+ * In order to have a way to communicate the final return value to
+ * our continuation, we pass this in place of "foreigndom", building
+ * on the fact that this argument isn't needed anymore.
+ */
+ rc = hypercall_create_continuation(
+ __HYPERVISOR_mmu_update, "hihi", null,
+ MMU_UPDATE_PREEMPTED, null, rc);
+ }
+
+ mm_put_pg_owner(pg_owner);
+
+ domain_mmap_cache_destroy(&mapcache);
+
+ perfc_add(num_page_updates, i);
+
+ out:
+ if ( pt_owner != d )
+ rcu_unlock_domain(pt_owner);
+
+ /* Add incremental work we have done to the @done output parameter. */
+ if ( unlikely(!guest_handle_is_null(pdone)) )
+ {
+ done += i;
+ copy_to_guest(pdone, &done, 1);
+ }
+
+ return rc;
+}
+
+static int __do_update_va_mapping(
+ unsigned long va, u64 val64, unsigned long flags, struct domain *pg_owner)
+{
+ l1_pgentry_t val = l1e_from_intpte(val64);
+ struct vcpu *v = current;
+ struct domain *d = v->domain;
+ struct page_info *gl1pg;
+ l1_pgentry_t *pl1e;
+ unsigned long bmap_ptr, gl1mfn;
+ cpumask_t *mask = NULL;
+ int rc;
+
+ perfc_incr(calls_to_update_va);
+
+ rc = xsm_update_va_mapping(XSM_TARGET, d, pg_owner, val);
+ if ( rc )
+ return rc;
+
+ rc = -EINVAL;
+ pl1e = guest_map_l1e(va, &gl1mfn);
+ if ( unlikely(!pl1e || !get_page_from_pagenr(gl1mfn, d)) )
+ goto out;
+
+ gl1pg = mfn_to_page(gl1mfn);
+ if ( !page_lock(gl1pg) )
+ {
+ put_page(gl1pg);
+ goto out;
+ }
+
+ if ( (gl1pg->u.inuse.type_info & PGT_type_mask) != PGT_l1_page_table )
+ {
+ page_unlock(gl1pg);
+ put_page(gl1pg);
+ goto out;
+ }
+
+ rc = mod_l1_entry(pl1e, val, gl1mfn, 0, v, pg_owner);
+
+ page_unlock(gl1pg);
+ put_page(gl1pg);
+
+ out:
+ if ( pl1e )
+ guest_unmap_l1e(pl1e);
+
+ switch ( flags & UVMF_FLUSHTYPE_MASK )
+ {
+ case UVMF_TLB_FLUSH:
+ switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+ {
+ case UVMF_LOCAL:
+ flush_tlb_local();
+ break;
+ case UVMF_ALL:
+ mask = d->domain_dirty_cpumask;
+ break;
+ default:
+ mask = this_cpu(scratch_cpumask);
+ rc = mm_vcpumask_to_pcpumask(d,
+ const_guest_handle_from_ptr(bmap_ptr,
+ void),
+ mask);
+ break;
+ }
+ if ( mask )
+ flush_tlb_mask(mask);
+ break;
+
+ case UVMF_INVLPG:
+ switch ( (bmap_ptr = flags & ~UVMF_FLUSHTYPE_MASK) )
+ {
+ case UVMF_LOCAL:
+ paging_invlpg(v, va);
+ break;
+ case UVMF_ALL:
+ mask = d->domain_dirty_cpumask;
+ break;
+ default:
+ mask = this_cpu(scratch_cpumask);
+ rc = mm_vcpumask_to_pcpumask(d,
+ const_guest_handle_from_ptr(bmap_ptr,
+ void),
+ mask);
+ break;
+ }
+ if ( mask )
+ flush_tlb_one_mask(mask, va);
+ break;
+ }
+
+ return rc;
+}
+
+long do_update_va_mapping(unsigned long va, u64 val64,
+ unsigned long flags)
+{
+ return __do_update_va_mapping(va, val64, flags, current->domain);
+}
+
+long do_update_va_mapping_otherdomain(unsigned long va, u64 val64,
+ unsigned long flags,
+ domid_t domid)
+{
+ struct domain *pg_owner;
+ int rc;
+
+ if ( (pg_owner = mm_get_pg_owner(domid)) == NULL )
+ return -ESRCH;
+
+ rc = __do_update_va_mapping(va, val64, flags, pg_owner);
+
+ mm_put_pg_owner(pg_owner);
+
+ return rc;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 967b7fcda9..170908f7f2 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -649,6 +649,11 @@ int mm_vcpumask_to_pcpumask(struct domain *d,
XEN_GUEST_HANDLE_PARAM(const_void) bmap,
cpumask_t *pmask);
+int create_grant_pv_mapping(uint64_t addr, unsigned long frame,
+ unsigned int flags, unsigned int cache_flags);
+int replace_grant_pv_mapping(uint64_t addr, unsigned long frame,
+ uint64_t new_addr, unsigned int flags);
+
#define MEM_LOG(_f, _a...) gdprintk(XENLOG_WARNING , _f "\n" , ## _a)
#define PAGE_CACHE_ATTRS (_PAGE_PAT|_PAGE_PCD|_PAGE_PWT)
--
2.11.0
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 30+ messages in thread