All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
@ 2016-02-17 12:28 Xen.org security team
  2017-01-25 14:08 ` David Woodhouse
  0 siblings, 1 reply; 21+ messages in thread
From: Xen.org security team @ 2016-02-17 12:28 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

            Xen Security Advisory CVE-2016-2270 / XSA-154
                              version 3

          x86: inconsistent cachability flags on guest mappings

UPDATES IN VERSION 3
====================

Clarify cumbersome Resolution wording.

The patch now adds a command line option to overcome the possible
performance regression.  Add patch backports.

Clarify origin of assertion (at start of patch description) that
inconsistent cacheability is a problem only for mmio pages.

Public release.

ISSUE DESCRIPTION
=================

Multiple mappings of the same physical page with different cachability
setting can cause problems.  While one category (risk of using stale
data) affects only guests themselves (and hence avoiding this can be
left for them to control), the other category being Machine Check
exceptions can be fatal to entire hosts.  According to the information
we were able to gather, only mappings of MMIO pages may surface this
second category, but even for them there were cases where the
hypervisor did not properly enforce consistent cachability.

IMPACT
======

A malicious guest administrator might be able to cause a reboot,
denying service to the entire host.

VULNERABLE SYSTEMS
==================

All Xen versions are affected.

Only x86 guests given control over some physical device can trigger
this vulnerability.

x86 systems are vulnerable.  ARM systems are not vulnerable.

The vulnerability depends on the system response to mapping the same
memory with different cacheability.  On some systems this is harmless;
on others, depending on CPU and chipset, it may be fatal.

MITIGATION
==========

Not handing physical devices to guests will also avoid this issue.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

We believe that the attached patch fixes the issue.  However, no
formal description of CPU behaviour in particular use cases has been
provided by Intel.  There has been no response from AMD.

We are aware of a potential performance regression with this patch on
some systems - even if no hardware passthrough is configured.  This is
due to the behaviour of some drivers and peripherals that is beyond
the scope of this security fix.  The patch adds a command line option
"mmio-relax" to overcome this possible regression for Domain 0 or all
para-virtual guests.  Note however that enabling this workaround will
reinstate the security issue these patches aim to address.

xsa154.patch        xen-unstable
xsa154-4.6.patch    Xen 4.6.x
xsa154-4.5.patch    Xen 4.5.x
xsa154-4.4.patch    Xen 4.4.x
xsa154-4.3.patch    Xen 4.3.x

$ sha256sum xsa154*
bbe7fba38ee30c00ef850fa6419c769e88b5669164d447f50b1ebbe333573152  xsa154.patch
011a4e33c0e476c52fe44253d50e01a1185948fd1b2a8e645274b25da6030d71  xsa154-4.3.patch
92d475bbc344127faa4f0183a9ccca9e975c7d24eb5772bf0a0a0a2e019144c6  xsa154-4.4.patch
b13737e71f22185b94ab25c07afd521add1a7e3886326c719d5df4d42f3f87f4  xsa154-4.5.patch
eec88c2a57466f83a81844cb7025f70c2b671d07a75d85487d4ed73cdabbb020  xsa154-4.6.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patch described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

However deployment of the mitigations described above is NOT permitted
(except where all the affected systems and VMs are administered and
used only by organisations which are members of the Xen Project
Security Issues Predisclosure List).  Specifically, deployment on
public cloud systems is NOT permitted.

This is because the configuration change would be visible to the guest,
which could lead to the rediscovery of the vulnerability.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJWxGayAAoJEIP+FMlX6CvZ9KwH/3z+9b7OjgpuIsOf0giZ5y99
yKoORWxQjcosYLQRQXvH62xtz0xRng+E3p+MeUm2qPUUuHFiqxSpZOAvW61C6DQL
l5KNNHlIjWB3N0YVmvgRbf3WMbeX1DCsEJEIFxZUQQs3fgGAiOfIEOwRL2FIhJ5Y
wP/z59fCuWs5lHoV0iAY3gkZHDd09JspCRQq8UGAc+X5jHF6fIOhUjZCS9KRQMJ5
p69ysdMj96fY5eKqwka/EXzvKMJUsQ42u5RQoYR5FhLx1UBi2otdcdbloKNseksA
7Wbf6j8Mz9NWVhvdZtnR/CNH8m5V7d78HsnGv7zNQCiMW+wg/k53yzHcw550P4w=
=5V3D
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa154.patch --]
[-- Type: application/octet-stream, Size: 13272 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1084,6 +1084,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -109,7 +109,10 @@ static unsigned long p2m_type_to_flags(p
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
         else
+        {
+            flags |= _PAGE_PWT;
             ASSERT(!level);
+        }
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -534,6 +534,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -574,7 +575,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d)
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -592,7 +596,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -691,10 +695,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
 
     // protect guest page tables
     //
@@ -1200,22 +1208,28 @@ static int shadow_set_l1e(struct domain
          && !sh_l1e_is_magic(new_sl1e) )
     {
         /* About to install a new reference */
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     }
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -99,6 +99,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void)
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -178,6 +178,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -871,10 +883,7 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
-#ifndef NDEBUG
-        const unsigned long *ro_map;
-        unsigned int seg, bdf;
-#endif
+        int flip = 0;
 
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
@@ -908,24 +917,55 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
 #ifndef NDEBUG
-        if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-             ((ro_map = pci_get_ro_map(seg)) != NULL &&
-              test_bit(bdf, ro_map)) )
-            printk(XENLOG_G_WARNING
-                   "d%d: Forcing read-only access to MFN %lx\n",
-                   l1e_owner->domain_id, mfn);
-        else
-            rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                   print_mmio_emul_range,
-                                   &(struct mmio_emul_range_ctxt){
-                                      .d = l1e_owner,
-                                      .mfn = mfn });
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
 #endif
-        return 1;
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1315,8 +1355,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1849,8 +1890,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1873,8 +1915,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5109,6 +5152,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5156,7 +5200,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5180,8 +5224,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #3: xsa154-4.3.patch --]
[-- Type: application/octet-stream, Size: 11895 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -656,6 +656,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,8 +693,12 @@ uint8_t epte_get_entry_emt(struct domain
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
 
     if ( hvm_get_mem_pinned_cacheattr(d, gfn, &type) )
         return type;
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -93,6 +93,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1201,22 +1209,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -188,6 +188,18 @@ static uint32_t base_disallow_mask;
       !is_hvm_domain(d)) ?                                      \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -773,6 +785,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -805,13 +819,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT(0);
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1210,8 +1252,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1706,8 +1749,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1730,8 +1774,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5010,6 +5055,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5057,7 +5103,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5081,8 +5127,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #4: xsa154-4.4.patch --]
[-- Type: application/octet-stream, Size: 11906 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -697,6 +697,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,8 +693,12 @@ uint8_t epte_get_entry_emt(struct domain
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
 
     if ( hvm_get_mem_pinned_cacheattr(d, gfn, &type) )
         return type;
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -93,6 +93,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1201,22 +1209,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -188,6 +188,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -773,6 +785,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -805,13 +819,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1210,8 +1252,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1706,8 +1749,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1730,8 +1774,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -4919,6 +4964,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -4966,7 +5012,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -4990,8 +5036,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #5: xsa154-4.5.patch --]
[-- Type: application/octet-stream, Size: 12122 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1000,6 +1000,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -810,8 +810,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -107,6 +107,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -678,10 +682,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1188,22 +1196,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -181,6 +181,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -766,6 +778,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -798,13 +812,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1194,8 +1236,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1690,8 +1733,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1714,8 +1758,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -4920,6 +4965,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -4967,7 +5013,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -4991,8 +5037,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #6: xsa154-4.6.patch --]
[-- Type: application/octet-stream, Size: 13247 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1080,6 +1080,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -807,8 +807,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -107,6 +107,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -519,6 +519,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -559,7 +560,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d)
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -577,7 +581,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
 
     // protect guest page tables
     //
@@ -1185,22 +1193,28 @@ static int shadow_set_l1e(struct domain
          && !sh_l1e_is_magic(new_sl1e) )
     {
         /* About to install a new reference */
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     }
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -99,6 +99,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void)
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -178,6 +178,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -799,10 +811,7 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
-#ifndef NDEBUG
-        const unsigned long *ro_map;
-        unsigned int seg, bdf;
-#endif
+        int flip = 0;
 
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
@@ -836,24 +845,55 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
 #ifndef NDEBUG
-        if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-             ((ro_map = pci_get_ro_map(seg)) != NULL &&
-              test_bit(bdf, ro_map)) )
-            printk(XENLOG_G_WARNING
-                   "d%d: Forcing read-only access to MFN %lx\n",
-                   l1e_owner->domain_id, mfn);
-        else
-            rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                   print_mmio_emul_range,
-                                   &(struct mmio_emul_range_ctxt){
-                                      .d = l1e_owner,
-                                      .mfn = mfn });
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
 #endif
-        return 1;
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1243,8 +1283,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1759,8 +1800,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1783,8 +1825,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5000,6 +5043,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5047,7 +5091,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5071,8 +5115,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #7: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-04-14  7:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 12:28 Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings Xen.org security team
2017-01-25 14:08 ` David Woodhouse
2017-01-25 14:21   ` Jan Beulich
2017-01-25 14:34     ` David Woodhouse
2017-01-25 16:08     ` David Woodhouse
2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
2017-01-26 10:45       ` Jan Beulich
2017-01-26 10:55         ` David Woodhouse
2017-01-26 11:32           ` Jan Beulich
2017-01-26 12:39         ` [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() " David Woodhouse
2017-01-26 14:35           ` Jan Beulich
2017-01-26 14:42             ` David Woodhouse
2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
2017-01-26 15:48         ` Jan Beulich
2017-01-27 15:36           ` Konrad Rzeszutek Wilk
2017-02-06 11:33             ` David Woodhouse
2017-02-07  5:08               ` Tian, Kevin
2017-04-14  7:51               ` Tian, Kevin
2017-02-07  5:05           ` Tian, Kevin
2017-02-08 16:04         ` David Woodhouse
2017-02-01 20:23     ` Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.