All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
@ 2016-02-17 12:28 Xen.org security team
  2017-01-25 14:08 ` David Woodhouse
  0 siblings, 1 reply; 21+ messages in thread
From: Xen.org security team @ 2016-02-17 12:28 UTC (permalink / raw)
  To: xen-announce, xen-devel, xen-users, oss-security; +Cc: Xen.org security team

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

            Xen Security Advisory CVE-2016-2270 / XSA-154
                              version 3

          x86: inconsistent cachability flags on guest mappings

UPDATES IN VERSION 3
====================

Clarify cumbersome Resolution wording.

The patch now adds a command line option to overcome the possible
performance regression.  Add patch backports.

Clarify origin of assertion (at start of patch description) that
inconsistent cacheability is a problem only for mmio pages.

Public release.

ISSUE DESCRIPTION
=================

Multiple mappings of the same physical page with different cachability
setting can cause problems.  While one category (risk of using stale
data) affects only guests themselves (and hence avoiding this can be
left for them to control), the other category being Machine Check
exceptions can be fatal to entire hosts.  According to the information
we were able to gather, only mappings of MMIO pages may surface this
second category, but even for them there were cases where the
hypervisor did not properly enforce consistent cachability.

IMPACT
======

A malicious guest administrator might be able to cause a reboot,
denying service to the entire host.

VULNERABLE SYSTEMS
==================

All Xen versions are affected.

Only x86 guests given control over some physical device can trigger
this vulnerability.

x86 systems are vulnerable.  ARM systems are not vulnerable.

The vulnerability depends on the system response to mapping the same
memory with different cacheability.  On some systems this is harmless;
on others, depending on CPU and chipset, it may be fatal.

MITIGATION
==========

Not handing physical devices to guests will also avoid this issue.

CREDITS
=======

This issue was discovered by Jan Beulich of SUSE.

RESOLUTION
==========

We believe that the attached patch fixes the issue.  However, no
formal description of CPU behaviour in particular use cases has been
provided by Intel.  There has been no response from AMD.

We are aware of a potential performance regression with this patch on
some systems - even if no hardware passthrough is configured.  This is
due to the behaviour of some drivers and peripherals that is beyond
the scope of this security fix.  The patch adds a command line option
"mmio-relax" to overcome this possible regression for Domain 0 or all
para-virtual guests.  Note however that enabling this workaround will
reinstate the security issue these patches aim to address.

xsa154.patch        xen-unstable
xsa154-4.6.patch    Xen 4.6.x
xsa154-4.5.patch    Xen 4.5.x
xsa154-4.4.patch    Xen 4.4.x
xsa154-4.3.patch    Xen 4.3.x

$ sha256sum xsa154*
bbe7fba38ee30c00ef850fa6419c769e88b5669164d447f50b1ebbe333573152  xsa154.patch
011a4e33c0e476c52fe44253d50e01a1185948fd1b2a8e645274b25da6030d71  xsa154-4.3.patch
92d475bbc344127faa4f0183a9ccca9e975c7d24eb5772bf0a0a0a2e019144c6  xsa154-4.4.patch
b13737e71f22185b94ab25c07afd521add1a7e3886326c719d5df4d42f3f87f4  xsa154-4.5.patch
eec88c2a57466f83a81844cb7025f70c2b671d07a75d85487d4ed73cdabbb020  xsa154-4.6.patch
$

DEPLOYMENT DURING EMBARGO
=========================

Deployment of the patch described above (or others which are
substantially similar) is permitted during the embargo, even on
public-facing systems with untrusted guest users and administrators.

However deployment of the mitigations described above is NOT permitted
(except where all the affected systems and VMs are administered and
used only by organisations which are members of the Xen Project
Security Issues Predisclosure List).  Specifically, deployment on
public cloud systems is NOT permitted.

This is because the configuration change would be visible to the guest,
which could lead to the rediscovery of the vulnerability.

But: Distribution of updated software is prohibited (except to other
members of the predisclosure list).

Predisclosure list members who wish to deploy significantly different
patches and/or mitigations, please contact the Xen Project Security
Team.


(Note: this during-embargo deployment notice is retained in
post-embargo publicly released Xen Project advisories, even though it
is then no longer applicable.  This is to enable the community to have
oversight of the Xen Project Security Team's decisionmaking.)

For more information about permissible uses of embargoed information,
consult the Xen Project community's agreed Security Policy:
  http://www.xenproject.org/security-policy.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)

iQEcBAEBAgAGBQJWxGayAAoJEIP+FMlX6CvZ9KwH/3z+9b7OjgpuIsOf0giZ5y99
yKoORWxQjcosYLQRQXvH62xtz0xRng+E3p+MeUm2qPUUuHFiqxSpZOAvW61C6DQL
l5KNNHlIjWB3N0YVmvgRbf3WMbeX1DCsEJEIFxZUQQs3fgGAiOfIEOwRL2FIhJ5Y
wP/z59fCuWs5lHoV0iAY3gkZHDd09JspCRQq8UGAc+X5jHF6fIOhUjZCS9KRQMJ5
p69ysdMj96fY5eKqwka/EXzvKMJUsQ42u5RQoYR5FhLx1UBi2otdcdbloKNseksA
7Wbf6j8Mz9NWVhvdZtnR/CNH8m5V7d78HsnGv7zNQCiMW+wg/k53yzHcw550P4w=
=5V3D
-----END PGP SIGNATURE-----

[-- Attachment #2: xsa154.patch --]
[-- Type: application/octet-stream, Size: 13272 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1084,6 +1084,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -109,7 +109,10 @@ static unsigned long p2m_type_to_flags(p
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
         else
+        {
+            flags |= _PAGE_PWT;
             ASSERT(!level);
+        }
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -534,6 +534,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -574,7 +575,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d)
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -592,7 +596,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -691,10 +695,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
 
     // protect guest page tables
     //
@@ -1200,22 +1208,28 @@ static int shadow_set_l1e(struct domain
          && !sh_l1e_is_magic(new_sl1e) )
     {
         /* About to install a new reference */
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     }
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -99,6 +99,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void)
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -178,6 +178,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -871,10 +883,7 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
-#ifndef NDEBUG
-        const unsigned long *ro_map;
-        unsigned int seg, bdf;
-#endif
+        int flip = 0;
 
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
@@ -908,24 +917,55 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
 #ifndef NDEBUG
-        if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-             ((ro_map = pci_get_ro_map(seg)) != NULL &&
-              test_bit(bdf, ro_map)) )
-            printk(XENLOG_G_WARNING
-                   "d%d: Forcing read-only access to MFN %lx\n",
-                   l1e_owner->domain_id, mfn);
-        else
-            rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                   print_mmio_emul_range,
-                                   &(struct mmio_emul_range_ctxt){
-                                      .d = l1e_owner,
-                                      .mfn = mfn });
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
 #endif
-        return 1;
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1315,8 +1355,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1849,8 +1890,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1873,8 +1915,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5109,6 +5152,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5156,7 +5200,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5180,8 +5224,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #3: xsa154-4.3.patch --]
[-- Type: application/octet-stream, Size: 11895 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -656,6 +656,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,8 +693,12 @@ uint8_t epte_get_entry_emt(struct domain
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
 
     if ( hvm_get_mem_pinned_cacheattr(d, gfn, &type) )
         return type;
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -93,6 +93,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1201,22 +1209,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -188,6 +188,18 @@ static uint32_t base_disallow_mask;
       !is_hvm_domain(d)) ?                                      \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -773,6 +785,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -805,13 +819,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT(0);
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1210,8 +1252,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1706,8 +1749,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1730,8 +1774,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5010,6 +5055,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5057,7 +5103,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5081,8 +5127,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #4: xsa154-4.4.patch --]
[-- Type: application/octet-stream, Size: 11906 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -697,6 +697,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -693,8 +693,12 @@ uint8_t epte_get_entry_emt(struct domain
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
 
     if ( hvm_get_mem_pinned_cacheattr(d, gfn, &type) )
         return type;
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -93,6 +93,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1201,22 +1209,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -188,6 +188,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -773,6 +785,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -805,13 +819,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1210,8 +1252,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1706,8 +1749,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1730,8 +1774,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -4919,6 +4964,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -4966,7 +5012,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -4990,8 +5036,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #5: xsa154-4.5.patch --]
[-- Type: application/octet-stream, Size: 12122 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1000,6 +1000,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -810,8 +810,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -107,6 +107,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -521,6 +521,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -561,7 +562,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d) 
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -579,7 +583,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -678,10 +682,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
     
     // protect guest page tables
     //
@@ -1188,22 +1196,28 @@ static int shadow_set_l1e(struct vcpu *v
          && !sh_l1e_is_magic(new_sl1e) ) 
     {
         /* About to install a new reference */        
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     } 
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -100,6 +100,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void) 
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -181,6 +181,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -766,6 +778,8 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
+        int flip = 0;
+
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
             put_page(page);
@@ -798,13 +812,41 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
-        dprintk(XENLOG_G_WARNING,
-                "d%d: Forcing read-only access to MFN %lx\n",
-                l1e_owner->domain_id, mfn);
-        return 1;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
+            dprintk(XENLOG_G_WARNING,
+                    "d%d: Forcing read-only access to MFN %lx\n",
+                    l1e_owner->domain_id, mfn);
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1194,8 +1236,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1690,8 +1733,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1714,8 +1758,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -4920,6 +4965,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -4967,7 +5013,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -4991,8 +5037,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #6: xsa154-4.6.patch --]
[-- Type: application/octet-stream, Size: 13247 bytes --]

x86: enforce consistent cachability of MMIO mappings

We've been told by Intel that inconsistent cachability between
multiple mappings of the same page can affect system stability only
when the affected page is an MMIO one. Since the stale data issue is
of no relevance to the hypervisor (since all guest memory accesses go
through proper accessors and validation), handling of RAM pages
remains unchanged here. Any MMIO mapped by domains however needs to be
done consistently (all cachable mappings or all uncachable ones), in
order to avoid Machine Check exceptions. Since converting existing
cachable mappings to uncachable (at the time an uncachable mapping
gets established) would in the PV case require tracking all mappings,
allow MMIO to only get mapped uncachable (UC, UC-, or WC).

This also implies that in the PV case we mustn't use the L1 PTE update
fast path when cachability flags get altered.

Since in the HVM case at least for now we want to continue honoring
pinned cachability attributes for pages not mapped by the hypervisor,
special case handling of r/o MMIO pages (forcing UC) gets added there.
Arguably the counterpart change to p2m-pt.c may not be necessary, since
UC- (which already gets enforced there) is probably strict enough.

Note that the shadow code changes include fixing the write protection
of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
than l1e_remove_flags() and alike, return the new PTE (and hence
ignoring their return values makes them no-ops).

This is CVE-2016-2270 / XSA-154.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1080,6 +1080,15 @@ limit is ignored by Xen.
 
 Specify if the MMConfig space should be enabled.
 
+### mmio-relax
+> `= <boolean> | all`
+
+> Default: `false`
+
+By default, domains may not create cached mappings to MMIO regions.
+This option relaxes the check for Domain 0 (or when using `all`, all PV
+domains), to permit the use of cacheable MMIO mappings.
+
 ### msi
 > `= <boolean>`
 
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -807,8 +807,17 @@ int epte_get_entry_emt(struct domain *d,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) )
+    if ( !mfn_valid(mfn_x(mfn)) ||
+         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+    {
+        *ipat = 1;
         return MTRR_TYPE_UNCACHABLE;
+    }
+
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+                                 mfn_x(mfn) + (1UL << order) - 1) )
+        return -1;
 
     switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
     {
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -107,6 +107,8 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_direct:
         if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
             flags |= _PAGE_RW;
+        else
+            flags |= _PAGE_PWT;
         return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     }
 }
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -519,6 +519,7 @@ _sh_propagate(struct vcpu *v,
     gfn_t target_gfn = guest_l1e_get_gfn(guest_entry);
     u32 pass_thru_flags;
     u32 gflags, sflags;
+    bool_t mmio_mfn;
 
     /* We don't shadow PAE l3s */
     ASSERT(GUEST_PAGING_LEVELS > 3 || level != 3);
@@ -559,7 +560,10 @@ _sh_propagate(struct vcpu *v,
     // mfn means that we can not usefully shadow anything, and so we
     // return early.
     //
-    if ( !mfn_valid(target_mfn)
+    mmio_mfn = !mfn_valid(target_mfn)
+               || (level == 1
+                   && page_get_owner(mfn_to_page(target_mfn)) == dom_io);
+    if ( mmio_mfn
          && !(level == 1 && (!shadow_mode_refcounts(d)
                              || p2mt == p2m_mmio_direct)) )
     {
@@ -577,7 +581,7 @@ _sh_propagate(struct vcpu *v,
                        _PAGE_RW | _PAGE_PRESENT);
     if ( guest_supports_nx(v) )
         pass_thru_flags |= _PAGE_NX_BIT;
-    if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+    if ( level == 1 && !shadow_mode_refcounts(d) && mmio_mfn )
         pass_thru_flags |= _PAGE_PAT | _PAGE_PCD | _PAGE_PWT;
     sflags = gflags & pass_thru_flags;
 
@@ -676,10 +680,14 @@ _sh_propagate(struct vcpu *v,
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) ||
-         (p2mt == p2m_mmio_direct &&
-          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
+    if ( p2m_is_readonly(p2mt) )
         sflags &= ~_PAGE_RW;
+    else if ( p2mt == p2m_mmio_direct &&
+              rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn)) )
+    {
+        sflags &= ~(_PAGE_RW | _PAGE_PAT);
+        sflags |= _PAGE_PCD | _PAGE_PWT;
+    }
 
     // protect guest page tables
     //
@@ -1185,22 +1193,28 @@ static int shadow_set_l1e(struct domain
          && !sh_l1e_is_magic(new_sl1e) )
     {
         /* About to install a new reference */
-        if ( shadow_mode_refcounts(d) ) {
+        if ( shadow_mode_refcounts(d) )
+        {
+#define PAGE_FLIPPABLE (_PAGE_RW | _PAGE_PWT | _PAGE_PCD | _PAGE_PAT)
+            int rc;
+
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
+            switch ( rc = shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
             default:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
                 break;
-            case 1:
-                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+            case PAGE_FLIPPABLE & -PAGE_FLIPPABLE ... PAGE_FLIPPABLE:
+                ASSERT(!(rc & ~PAGE_FLIPPABLE));
+                new_sl1e = shadow_l1e_flip_flags(new_sl1e, rc);
                 /* fall through */
             case 0:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
                 break;
             }
+#undef PAGE_FLIPPABLE
         }
     }
 
--- a/xen/arch/x86/mm/shadow/types.h
+++ b/xen/arch/x86/mm/shadow/types.h
@@ -99,6 +99,9 @@ static inline u32 shadow_l4e_get_flags(s
 static inline shadow_l1e_t
 shadow_l1e_remove_flags(shadow_l1e_t sl1e, u32 flags)
 { l1e_remove_flags(sl1e, flags); return sl1e; }
+static inline shadow_l1e_t
+shadow_l1e_flip_flags(shadow_l1e_t sl1e, u32 flags)
+{ l1e_flip_flags(sl1e, flags); return sl1e; }
 
 static inline shadow_l1e_t shadow_l1e_empty(void)
 { return l1e_empty(); }
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -178,6 +178,18 @@ static uint32_t base_disallow_mask;
       is_pv_domain(d)) ?                                        \
      L1_DISALLOW_MASK : (L1_DISALLOW_MASK & ~PAGE_CACHE_ATTRS))
 
+static s8 __read_mostly opt_mmio_relax;
+static void __init parse_mmio_relax(const char *s)
+{
+    if ( !*s )
+        opt_mmio_relax = 1;
+    else
+        opt_mmio_relax = parse_bool(s);
+    if ( opt_mmio_relax < 0 && strcmp(s, "all") )
+        opt_mmio_relax = 0;
+}
+custom_param("mmio-relax", parse_mmio_relax);
+
 static void __init init_frametable_chunk(void *start, void *end)
 {
     unsigned long s = (unsigned long)start;
@@ -799,10 +811,7 @@ get_page_from_l1e(
     if ( !mfn_valid(mfn) ||
          (real_pg_owner = page_get_owner_and_reference(page)) == dom_io )
     {
-#ifndef NDEBUG
-        const unsigned long *ro_map;
-        unsigned int seg, bdf;
-#endif
+        int flip = 0;
 
         /* Only needed the reference to confirm dom_io ownership. */
         if ( mfn_valid(mfn) )
@@ -836,24 +845,55 @@ get_page_from_l1e(
             return -EINVAL;
         }
 
-        if ( !(l1f & _PAGE_RW) ||
-             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
-            return 0;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+        {
+            /* MMIO pages must not be mapped cachable unless requested so. */
+            switch ( opt_mmio_relax )
+            {
+            case 0:
+                break;
+            case 1:
+                if ( is_hardware_domain(l1e_owner) )
+            case -1:
+                    return 0;
+            default:
+                ASSERT_UNREACHABLE();
+            }
+        }
+        else if ( l1f & _PAGE_RW )
+        {
 #ifndef NDEBUG
-        if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
-             ((ro_map = pci_get_ro_map(seg)) != NULL &&
-              test_bit(bdf, ro_map)) )
-            printk(XENLOG_G_WARNING
-                   "d%d: Forcing read-only access to MFN %lx\n",
-                   l1e_owner->domain_id, mfn);
-        else
-            rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
-                                   print_mmio_emul_range,
-                                   &(struct mmio_emul_range_ctxt){
-                                      .d = l1e_owner,
-                                      .mfn = mfn });
+            const unsigned long *ro_map;
+            unsigned int seg, bdf;
+
+            if ( !pci_mmcfg_decode(mfn, &seg, &bdf) ||
+                 ((ro_map = pci_get_ro_map(seg)) != NULL &&
+                  test_bit(bdf, ro_map)) )
+                printk(XENLOG_G_WARNING
+                       "d%d: Forcing read-only access to MFN %lx\n",
+                       l1e_owner->domain_id, mfn);
+            else
+                rangeset_report_ranges(mmio_ro_ranges, 0, ~0UL,
+                                       print_mmio_emul_range,
+                                       &(struct mmio_emul_range_ctxt){
+                                           .d = l1e_owner,
+                                           .mfn = mfn });
 #endif
-        return 1;
+            flip = _PAGE_RW;
+        }
+
+        switch ( l1f & PAGE_CACHE_ATTRS )
+        {
+        case 0: /* WB */
+            flip |= _PAGE_PWT | _PAGE_PCD;
+            break;
+        case _PAGE_PWT: /* WT */
+        case _PAGE_PWT | _PAGE_PAT: /* WP */
+            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+            break;
+        }
+
+        return flip;
     }
 
     if ( unlikely( (real_pg_owner != pg_owner) &&
@@ -1243,8 +1283,9 @@ static int alloc_l1_table(struct page_in
                 goto fail;
             case 0:
                 break;
-            case 1:
-                l1e_remove_flags(pl1e[i], _PAGE_RW);
+            case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+                ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+                l1e_flip_flags(pl1e[i], ret);
                 break;
             }
 
@@ -1759,8 +1800,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return -EINVAL;
         }
 
-        /* Fast path for identical mapping, r/w and presence. */
-        if ( !l1e_has_changed(ol1e, nl1e, _PAGE_RW | _PAGE_PRESENT) )
+        /* Fast path for identical mapping, r/w, presence, and cachability. */
+        if ( !l1e_has_changed(ol1e, nl1e,
+                              PAGE_CACHE_ATTRS | _PAGE_RW | _PAGE_PRESENT) )
         {
             adjust_guest_l1e(nl1e, pt_dom);
             if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -1783,8 +1825,9 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         case 0:
             break;
-        case 1:
-            l1e_remove_flags(nl1e, _PAGE_RW);
+        case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+            ASSERT(!(rc & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+            l1e_flip_flags(nl1e, rc);
             rc = 0;
             break;
         }
@@ -5000,6 +5043,7 @@ static int ptwr_emulated_update(
     l1_pgentry_t pte, ol1e, nl1e, *pl1e;
     struct vcpu *v = current;
     struct domain *d = v->domain;
+    int ret;
 
     /* Only allow naturally-aligned stores within the original %cr2 page. */
     if ( unlikely(((addr^ptwr_ctxt->cr2) & PAGE_MASK) || (addr & (bytes-1))) )
@@ -5047,7 +5091,7 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    switch ( get_page_from_l1e(nl1e, d, d) )
+    switch ( ret = get_page_from_l1e(nl1e, d, d) )
     {
     default:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
@@ -5071,8 +5115,9 @@ static int ptwr_emulated_update(
         break;
     case 0:
         break;
-    case 1:
-        l1e_remove_flags(nl1e, _PAGE_RW);
+    case _PAGE_RW ... _PAGE_RW | PAGE_CACHE_ATTRS:
+        ASSERT(!(ret & ~(_PAGE_RW | PAGE_CACHE_ATTRS)));
+        l1e_flip_flags(nl1e, ret);
         break;
     }
 
--- a/xen/include/asm-x86/page.h
+++ b/xen/include/asm-x86/page.h
@@ -157,6 +157,9 @@ static inline l4_pgentry_t l4e_from_padd
 #define l3e_remove_flags(x, flags) ((x).l3 &= ~put_pte_flags(flags))
 #define l4e_remove_flags(x, flags) ((x).l4 &= ~put_pte_flags(flags))
 
+/* Flip flags in an existing L1 PTE. */
+#define l1e_flip_flags(x, flags)    ((x).l1 ^= put_pte_flags(flags))
+
 /* Check if a pte's page mapping or significant access flags have changed. */
 #define l1e_has_changed(x,y,flags) \
     ( !!(((x).l1 ^ (y).l1) & ((PADDR_MASK&PAGE_MASK)|put_pte_flags(flags))) )

[-- Attachment #7: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
  2016-02-17 12:28 Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings Xen.org security team
@ 2017-01-25 14:08 ` David Woodhouse
  2017-01-25 14:21   ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2017-01-25 14:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, H. Peter Anvin, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 4752 bytes --]

> x86: enforce consistent cachability of MMIO mappings
> 
> We've been told by Intel that inconsistent cachability between
> multiple mappings of the same page can affect system stability only
> when the affected page is an MMIO one. Since the stale data issue is
> of no relevance to the hypervisor (since all guest memory accesses go
> through proper accessors and validation), handling of RAM pages
> remains unchanged here. Any MMIO mapped by domains however needs to be
> done consistently (all cachable mappings or all uncachable ones), in
> order to avoid Machine Check exceptions. Since converting existing
> cachable mappings to uncachable (at the time an uncachable mapping
> gets established) would in the PV case require tracking all mappings,
> allow MMIO to only get mapped uncachable (UC, UC-, or WC).
> 
> This also implies that in the PV case we mustn't use the L1 PTE update
> fast path when cachability flags get altered.
> 
> Since in the HVM case at least for now we want to continue honoring
> pinned cachability attributes for pages not mapped by the hypervisor,
> special case handling of r/o MMIO pages (forcing UC) gets added there.
> Arguably the counterpart change to p2m-pt.c may not be necessary, since
> UC- (which already gets enforced there) is probably strict enough.
> 
> Note that the shadow code changes include fixing the write protection
> of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other
> than l1e_remove_flags() and alike, return the new PTE (and hence
> ignoring their return values makes them no-ops).
> 
> This is CVE-2016-2270 / XSA-154.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

...

> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d,
>      if ( v->domain != d )
>          v = d->vcpu ? d->vcpu[0] : NULL;
>  
> -    if ( !mfn_valid(mfn_x(mfn)) )
> +    if ( !mfn_valid(mfn_x(mfn)) ||
> +         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
> +                                 mfn_x(mfn) + (1UL < order) - 1) )
> +    {
> +        *ipat = 1;
>          return MTRR_TYPE_UNCACHABLE;
> +    }
> +
> +    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> +                                 mfn_x(mfn) + (1UL < order) - 1) )
> +        return -1;
>  
>      switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
>      {

This doesn't look right. That second 'if(rangeset_overlaps_range(…))'
is tautologically false, because if it *is* true, the first if()
statement happens first and it's never reached.

The reason I'm looking is because that first if() statement is
happening for MMIO regions where it probably shouldn't. This means that
guests are mapping MMIO BARs of assigned devices and getting *forced*
UC (because *ipat=1) instead of taking the if(direct_mmio) path
slightly further down — which wouldn't set the 'ignore PAT' bit, and
would thus allow them to enable WC through their PAT.

It makes me wonder if the first was actually intended to be
'!mfn_valid() && rangeset_contains_range(…)' — with logical && rather
than ||. That would make some sense because it's then explicitly
refusing to map pages which are in mmio_ro_ranges *and* mfn_valid().

And then there's a 'if (direct_mmio) return UC;' further down which
looks like it'd do the right thing for the use case I'm actually
testing. I may see if I can construct a straw man patch, but I'm kind
of unfamiliar with this code so it should be taken with a large pinch
of salt...

There is a separate question of whether it's actually safe to let the
guest map an MMIO page with both UC and WC simultaneously. Empirically,
it seems to be OK — I hacked a guest kernel not to enforce the mutual
exclusion, mapped the BAR with both UC and WC and ran two kernel
threads, reading and writing the whole BAR in a number of iterations.
The WC thread went a lot faster than the UC one, so it will have often
been touching the same locations as the UC thread as it 'overtook' it,
and nothing bad happened. This seems reasonable, as the dire warnings
and machine checks are more about *cached* vs. uncached mappings, not
WC vs. UC. But it would be good to have a definitive answer from Intel
and AMD about whether it's safe.

-- 
dwmw2

¹ Or is that the problem — is mfn_valid() supposed to be true for MMIO
BARs, and this is actually an SR-IOV problem with resource assignment
failing to do that for VFs?

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
  2017-01-25 14:08 ` David Woodhouse
@ 2017-01-25 14:21   ` Jan Beulich
  2017-01-25 14:34     ` David Woodhouse
                       ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Jan Beulich @ 2017-01-25 14:21 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel

>>> On 25.01.17 at 15:08, <dwmw2@infradead.org> wrote:
>> --- a/xen/arch/x86/hvm/mtrr.c
>> +++ b/xen/arch/x86/hvm/mtrr.c
>> @@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d,
>>      if ( v->domain != d )
>>          v = d->vcpu ? d->vcpu[0] : NULL;
>>  
>> -    if ( !mfn_valid(mfn_x(mfn)) )
>> +    if ( !mfn_valid(mfn_x(mfn)) ||
>> +         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
>> +                                 mfn_x(mfn) + (1UL < order) - 1) )
>> +    {
>> +        *ipat = 1;
>>          return MTRR_TYPE_UNCACHABLE;
>> +    }
>> +
>> +    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>> +                                 mfn_x(mfn) + (1UL < order) - 1) )
>> +        return -1;
>>  
>>      switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) )
>>      {
> 
> This doesn't look right. That second 'if(rangeset_overlaps_range(…))'
> is tautologically false, because if it *is* true, the first if()
> statement happens first and it's never reached.

Note the difference between "contains" and "overlaps".

> The reason I'm looking is because that first if() statement is
> happening for MMIO regions where it probably shouldn't. This means that
> guests are mapping MMIO BARs of assigned devices and getting *forced*
> UC (because *ipat=1) instead of taking the if(direct_mmio) path
> slightly further down — which wouldn't set the 'ignore PAT' bit, and
> would thus allow them to enable WC through their PAT.
> 
> It makes me wonder if the first was actually intended to be
> '!mfn_valid() && rangeset_contains_range(…)' — with logical && rather
> than ||. That would make some sense because it's then explicitly
> refusing to map pages which are in mmio_ro_ranges *and* mfn_valid().

No, this surely wasn't the intention. As Andrew has tried to
explain on irc, the only valid implication is !mfn_valid() -> MMIO.

> And then there's a 'if (direct_mmio) return UC;' further down which
> looks like it'd do the right thing for the use case I'm actually
> testing. I may see if I can construct a straw man patch, but I'm kind
> of unfamiliar with this code so it should be taken with a large pinch
> of salt...

If there wasn't INVALID_MFN to be taken care of, the !mfn_valid()
check could simply move down, and be combined with the
direct_mmio one.

> There is a separate question of whether it's actually safe to let the
> guest map an MMIO page with both UC and WC simultaneously. Empirically,
> it seems to be OK — I hacked a guest kernel not to enforce the mutual
> exclusion, mapped the BAR with both UC and WC and ran two kernel
> threads, reading and writing the whole BAR in a number of iterations.
> The WC thread went a lot faster than the UC one, so it will have often
> been touching the same locations as the UC thread as it 'overtook' it,
> and nothing bad happened. This seems reasonable, as the dire warnings
> and machine checks are more about *cached* vs. uncached mappings, not
> WC vs. UC. But it would be good to have a definitive answer from Intel
> and AMD about whether it's safe.

Well, in the context of this XSA we've asked both of them, and iirc
we've got a vague reply from Intel and none from AMD. In fact we
did defer the XSA for quite a bit waiting for any useful feedback.
To AMD's advantage I'd like to add though that iirc they're a little
more clear in their PM about the specific question of UC and WC
you raise: They group the various cacheabilities into two groups
(cacheable and uncacheable) and require there to only not be
any mixture between groups. Iirc Intel's somewhat vague reply
allowed us to conclude we're likely safe that way on their side too.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
  2017-01-25 14:21   ` Jan Beulich
@ 2017-01-25 14:34     ` David Woodhouse
  2017-01-25 16:08     ` David Woodhouse
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2017-01-25 14:34 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1667 bytes --]

On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> Note the difference between "contains" and "overlaps".

Oh, that makes more sense; thanks.

> > And then there's a 'if (direct_mmio) return UC;' further down which
> > looks like it'd do the right thing for the use case I'm actually
> > testing. I may see if I can construct a straw man patch, but I'm kind
> > of unfamiliar with this code so it should be taken with a large pinch
> > of salt...
>
> If there wasn't INVALID_MFN to be taken care of, the !mfn_valid()
> check could simply move down, and be combined with the
> direct_mmio one.

I'll see what I can come up with.

> > WC vs. UC. But it would be good to have a definitive answer from Intel
> > and AMD about whether it's safe.
>
> Well, in the context of this XSA we've asked both of them, and iirc
> we've got a vague reply from Intel and none from AMD. In fact we
> did defer the XSA for quite a bit waiting for any useful feedback.
> To AMD's advantage I'd like to add though that iirc they're a little
> more clear in their PM about the specific question of UC and WC
> you raise: They group the various cacheabilities into two groups
> (cacheable and uncacheable) and require there to only not be
> any mixture between groups. Iirc Intel's somewhat vague reply
> allowed us to conclude we're likely safe that way on their side too.

That's useful information; thanks. Coupled with the empirical testing
and the fact that we're already "violating" the recommendation of the
Intel SDM by allowing a mixture of memory types on RAM pages, I think
that's probably sufficient to conclude that it's OK to permit this.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
  2017-01-25 14:21   ` Jan Beulich
  2017-01-25 14:34     ` David Woodhouse
@ 2017-01-25 16:08     ` David Woodhouse
  2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
  2017-02-01 20:23     ` Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings David Woodhouse
  3 siblings, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2017-01-25 16:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3094 bytes --]

On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> 
> If there wasn't INVALID_MFN to be taken care of, the !mfn_valid()
> check could simply move down, and be combined with the
> direct_mmio one.

As discussed on IRC, once we fix the overflow with order > 0, I think
INVALID_MFN is OK. Not that it should ever really happen, AFAICT. 

This seems to do the right thing for my MMIO WC test. I haven't
actually combined the !mfn_valid() check with the direct_mmio one.
Under what circumstances does that make sense anyway? For now in the
patch below I've left it *forcing* UC, unlike the direct_mmio path
which lets the guest use WC. But really, shouldn't the '!direct_mmio &&
!mfn_valid()' case just return an error?

Note that as well as using a mask for 'order' I've attempted to
consolidate the unlikely rangeset_overlaps_range() and
rangeset_contains_range() calls, on the assumption that the former will
*always* be true if the latter is, so we only need one of them in the
fast path through the function.

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..09c2f5c 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,20 +773,24 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+    /* INVALID_MFN should never happen here, but if it does then this
+     * should do the right thing instead of exploding */
+    if ( unlikely(rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+					  mfn_x(mfn) | ((1UL << order) - 1))) )
     {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
+	if ( rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+				     mfn_x(mfn) | ((1UL << order) - 1)) )
+	{
+	    *ipat = 1;
+	    return MTRR_TYPE_UNCACHABLE;
+	}
+	/* Overlaps mmio_ro_ranges but is not entirely contained. No. */
+	return -1;
     }
 
-    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-        return -1;
-
     if ( direct_mmio )
     {
+	/* Again, INVALID_MFN should do the right thing here. */
         if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
             return MTRR_TYPE_UNCACHABLE;
         if ( order )
@@ -795,6 +799,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( unlikely(!mfn_valid(mfn_x(mfn))) )
+    {
+	*ipat = 1;
+	return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] x86: Allow write-combining on MMIO mappings again
  2017-01-25 14:21   ` Jan Beulich
  2017-01-25 14:34     ` David Woodhouse
  2017-01-25 16:08     ` David Woodhouse
@ 2017-01-26  8:57     ` David Woodhouse
  2017-01-26 10:45       ` Jan Beulich
  2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
  2017-02-01 20:23     ` Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings David Woodhouse
  3 siblings, 2 replies; 21+ messages in thread
From: David Woodhouse @ 2017-01-26  8:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3634 bytes --]

From: David Woodhouse <dwmw@amazon.com>

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), HVM guests have no longer
been able to use PAT to obtain write-combining on MMIO because the
'ignore PAT' bit is set in EPT.

For MMIO we shouldn't be setting the 'ignore PAT' bit, although we
probably want to err on the side of caution and still do so for
addresses in mmio_ro_ranges. That necessitates a slight refactoring
to let the MMIO (for which mfn_vaiid() can be false) get through
to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the
mfn is INVALID_MFN (which happens on unmap, since we carefully call
this function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them
in the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved
by splitting the large page. Not for blindly returning 'error', which
I was tempted to do in my first attempt.

Signed-off-by: David Woodhouse <dwmw@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..41ae8b4 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,18 +773,20 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+				 mfn_x(mfn) | ((1UL << order) - 1)) )
     {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
+	if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+					       mfn_x(mfn) | ((1UL << order) - 1)) )
+	{
+	    *ipat = 1;
+	    return MTRR_TYPE_UNCACHABLE;
+	}
+	/* Force invalid memory type so resolve_misconfig() will split it */
+	return -1;
     }
 
-    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-        return -1;
-
     if ( direct_mmio )
     {
         if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( !mfn_valid(mfn_x(mfn)) )
+    {
+	*ipat = 1;
+	return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;
-- 
2.7.4

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86: Allow write-combining on MMIO mappings again
  2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
@ 2017-01-26 10:45       ` Jan Beulich
  2017-01-26 10:55         ` David Woodhouse
  2017-01-26 12:39         ` [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() " David Woodhouse
  2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
  1 sibling, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2017-01-26 10:45 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel

>>> On 26.01.17 at 09:57, <dwmw2@infradead.org> wrote:
> Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
> consistent cachability of MMIO mappings"), HVM guests have no longer
> been able to use PAT to obtain write-combining on MMIO because the
> 'ignore PAT' bit is set in EPT.

This is too broad a statement: MMIO pages for which mfn_valid()
is true would still allow WC use.

The only other issue with the patch that I can see is that you need
to eliminate all the hard tabs you're introducing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86: Allow write-combining on MMIO mappings again
  2017-01-26 10:45       ` Jan Beulich
@ 2017-01-26 10:55         ` David Woodhouse
  2017-01-26 11:32           ` Jan Beulich
  2017-01-26 12:39         ` [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() " David Woodhouse
  1 sibling, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2017-01-26 10:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 519 bytes --]

On Thu, 2017-01-26 at 03:45 -0700, Jan Beulich wrote:
> 
> This is too broad a statement: MMIO pages for which mfn_valid()
> is true would still allow WC use.

... where mfn_valid() would be true for MMIO regions in the low memory
hole, but not for regions above all physical memory?

> The only other issue with the patch that I can see is that you need
> to eliminate all the hard tabs you're introducing.

Bad emacs; no biscuit. I shall teach it that we're not in Kansas any
more, and repost. Thanks.


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] x86: Allow write-combining on MMIO mappings again
  2017-01-26 10:55         ` David Woodhouse
@ 2017-01-26 11:32           ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2017-01-26 11:32 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel

>>> On 26.01.17 at 11:55, <dwmw2@infradead.org> wrote:
> On Thu, 2017-01-26 at 03:45 -0700, Jan Beulich wrote:
>> 
>> This is too broad a statement: MMIO pages for which mfn_valid()
>> is true would still allow WC use.
> 
> ... where mfn_valid() would be true for MMIO regions in the low memory
> hole, but not for regions above all physical memory?

Even for pages immediately above physical memory mfn_valid()
may be true, iirc.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 10:45       ` Jan Beulich
  2017-01-26 10:55         ` David Woodhouse
@ 2017-01-26 12:39         ` David Woodhouse
  2017-01-26 14:35           ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: David Woodhouse @ 2017-01-26 12:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3704 bytes --]

From: David Woodhouse <dwmw@amazon.com>

For some MMIO regions, such as those high above RAM, mfn_valid() will
return false.

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), guests have no longer been
able to use PAT to obtain write-combining on such regions because the
'ignore PAT' bit is set in EPT.

We probably want to err on the side of caution and preserve that
behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
mappings. That necessitates a slight refactoring to check mfn_valid()
later, and let the MMIO case get through to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the mfn
is INVALID_MFN (which happens on unmap, since we carefully call this
function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them in
the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved by
splitting the large page. Not for blindly returning 'error', which I was
tempted to do in my first attempt.

Signed-off-by: David Woodhouse <dwmw@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..41ae8b4 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,18 +773,20 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
+    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+				 mfn_x(mfn) | ((1UL << order) - 1)) )
     {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
+	if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+					       mfn_x(mfn) | ((1UL << order) - 1)) )
+	{
+	    *ipat = 1;
+	    return MTRR_TYPE_UNCACHABLE;
+	}
+	/* Force invalid memory type so resolve_misconfig() will split it */
+	return -1;
     }
 
-    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-        return -1;
-
     if ( direct_mmio )
     {
         if ( (mfn_x(mfn) ^ d->arch.hvm_domain.vmx.apic_access_mfn) >> order )
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( !mfn_valid(mfn_x(mfn)) )
+    {
+	*ipat = 1;
+	return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;
-- 
2.7.4

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 12:39         ` [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() " David Woodhouse
@ 2017-01-26 14:35           ` Jan Beulich
  2017-01-26 14:42             ` David Woodhouse
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2017-01-26 14:35 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel

>>> On 26.01.17 at 13:39, <dwmw2@infradead.org> wrote:
> --- a/xen/arch/x86/hvm/mtrr.c
> +++ b/xen/arch/x86/hvm/mtrr.c
> @@ -773,18 +773,20 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
>      if ( v->domain != d )
>          v = d->vcpu ? d->vcpu[0] : NULL;
>  
> -    if ( !mfn_valid(mfn_x(mfn)) ||
> -         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
> -                                 mfn_x(mfn) + (1UL << order) - 1) )
> +    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
> +    if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
> +				 mfn_x(mfn) | ((1UL << order) - 1)) )

Hmm, didn't you say you'd take care of the hard tabs?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 14:35           ` Jan Beulich
@ 2017-01-26 14:42             ` David Woodhouse
  0 siblings, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2017-01-26 14:42 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 275 bytes --]

On Thu, 2017-01-26 at 07:35 -0700, Jan Beulich wrote:
> 
> Hmm, didn't you say you'd take care of the hard tabs?

Er... yes. But it seems I neglected to specify whether I would also
*save* the resulting file, 'git commit --amend', and actually send that
version. Sorry.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
  2017-01-26 10:45       ` Jan Beulich
@ 2017-01-26 14:50       ` David Woodhouse
  2017-01-26 15:48         ` Jan Beulich
  2017-02-08 16:04         ` David Woodhouse
  1 sibling, 2 replies; 21+ messages in thread
From: David Woodhouse @ 2017-01-26 14:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3457 bytes --]

From: David Woodhouse <dwmw@amazon.com>

For some MMIO regions, such as those high above RAM, mfn_valid() will
return false.

Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), guests have no longer been
able to use PAT to obtain write-combining on such regions because the
'ignore PAT' bit is set in EPT.

We probably want to err on the side of caution and preserve that
behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
mappings. That necessitates a slight refactoring to check mfn_valid()
later, and let the MMIO case get through to the right code path.

Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope — simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the mfn
is INVALID_MFN (which happens on unmap, since we carefully call this
function to fill in the EMT even though the PTE won't be valid).

The range checks are also slightly refactored to put only one of them in
the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.

Finally, add a comment clarifying how that 'return -1' works — it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved by
splitting the large page. Not for blindly returning 'error', which I was
tempted to do in my first attempt.

Signed-off-by: David Woodhouse <dwmw@amazon.com>
---
 xen/arch/x86/hvm/mtrr.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 709759c..8fef756 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -773,17 +773,19 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-    {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
-    }
-
+    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
     if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+                                 mfn_x(mfn) | ((1UL << order) - 1)) )
+    {
+        if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                               mfn_x(mfn) | ((1UL << order) - 1)) )
+        {
+            *ipat = 1;
+            return MTRR_TYPE_UNCACHABLE;
+        }
+        /* Force invalid memory type so resolve_misconfig() will split it */
         return -1;
+    }
 
     if ( direct_mmio )
     {
@@ -795,6 +797,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( !mfn_valid(mfn_x(mfn)) )
+    {
+        *ipat = 1;
+        return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;
-- 
2.7.4

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
@ 2017-01-26 15:48         ` Jan Beulich
  2017-01-27 15:36           ` Konrad Rzeszutek Wilk
  2017-02-07  5:05           ` Tian, Kevin
  2017-02-08 16:04         ` David Woodhouse
  1 sibling, 2 replies; 21+ messages in thread
From: Jan Beulich @ 2017-01-26 15:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andrew Cooper, Kevin Tian, H. Peter Anvin, Jun Nakajima, xen-devel

>>> On 26.01.17 at 15:50, <dwmw2@infradead.org> wrote:
> From: David Woodhouse <dwmw@amazon.com>
> 
> For some MMIO regions, such as those high above RAM, mfn_valid() will
> return false.
> 
> Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
> consistent cachability of MMIO mappings"), guests have no longer been
> able to use PAT to obtain write-combining on such regions because the
> 'ignore PAT' bit is set in EPT.
> 
> We probably want to err on the side of caution and preserve that
> behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
> mappings. That necessitates a slight refactoring to check mfn_valid()
> later, and let the MMIO case get through to the right code path.
> 
> Since we're not bailing out for !mfn_valid() immediately, the range
> checks need to be adjusted to cope — simply by masking in the low bits
> to account for 'order' instead of adding, to avoid overflow when the mfn
> is INVALID_MFN (which happens on unmap, since we carefully call this
> function to fill in the EMT even though the PTE won't be valid).
> 
> The range checks are also slightly refactored to put only one of them in
> the fast path in the common case. If it doesn't overlap, then it
> *definitely* isn't contained, so we don't need both checks. And if it
> overlaps and is only one page, then it definitely *is* contained.
> 
> Finally, add a comment clarifying how that 'return -1' works — it isn't
> returning an error and causing the mapping to fail; it relies on
> resolve_misconfig() being able to split the mapping later. So it's
> *only* sane to do it where order>0 and the 'problem' will be solved by
> splitting the large page. Not for blindly returning 'error', which I was
> tempted to do in my first attempt.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

But before committing I'd prefer to have a VMX maintainer ack
too.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 15:48         ` Jan Beulich
@ 2017-01-27 15:36           ` Konrad Rzeszutek Wilk
  2017-02-06 11:33             ` David Woodhouse
  2017-02-07  5:05           ` Tian, Kevin
  1 sibling, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-01-27 15:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Andrew Cooper, xen-devel, Jun Nakajima,
	H. Peter Anvin, David Woodhouse

. snip ..
> > Signed-off-by: David Woodhouse <dwmw@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> But before committing I'd prefer to have a VMX maintainer ack
> too.

Who is gone until Feb yth (Spring festival in China).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings
  2017-01-25 14:21   ` Jan Beulich
                       ` (2 preceding siblings ...)
  2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
@ 2017-02-01 20:23     ` David Woodhouse
  3 siblings, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2017-02-01 20:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 880 bytes --]

On Wed, 2017-01-25 at 07:21 -0700, Jan Beulich wrote:
> 
> Well, in the context of this XSA we've asked both of them, and iirc
> we've got a vague reply from Intel and none from AMD. In fact we
> did defer the XSA for quite a bit waiting for any useful feedback.
> To AMD's advantage I'd like to add though that iirc they're a little
> more clear in their PM about the specific question of UC and WC
> you raise: They group the various cacheabilities into two groups
> (cacheable and uncacheable) and require there to only not be
> any mixture between groups. Iirc Intel's somewhat vague reply
> allowed us to conclude we're likely safe that way on their side too.

It would be good to get a definitive answer from Intel, to match AMD's.
That's basically why I added hpa to CC, in fact.

Peter, is there any possibility of a clarification here, please?

Thanks.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-27 15:36           ` Konrad Rzeszutek Wilk
@ 2017-02-06 11:33             ` David Woodhouse
  2017-02-07  5:08               ` Tian, Kevin
  2017-04-14  7:51               ` Tian, Kevin
  0 siblings, 2 replies; 21+ messages in thread
From: David Woodhouse @ 2017-02-06 11:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Andrew Cooper, Kevin Tian, H. Peter Anvin, Jun Nakajima, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 555 bytes --]

On Fri, 2017-01-27 at 10:36 -0500, Konrad Rzeszutek Wilk wrote:
> . snip ..
> > 
> > > 
> > > Signed-off-by: David Woodhouse <dwmw@amazon.com>
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > 
> > But before committing I'd prefer to have a VMX maintainer ack
> > too.
> Who is gone until Feb yth (Spring festival in China).

Not sure what number that 'y' was supposed to be, but I see Kevin
posting today... :)

Would also be very good to have a definitive answer about the safety of
mixing WC and UC on MMIO mappings on Intel hardware.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 15:48         ` Jan Beulich
  2017-01-27 15:36           ` Konrad Rzeszutek Wilk
@ 2017-02-07  5:05           ` Tian, Kevin
  1 sibling, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2017-02-07  5:05 UTC (permalink / raw)
  To: Jan Beulich, David Woodhouse
  Cc: Andrew Cooper, Anvin, H Peter, Nakajima, Jun, xen-devel

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, January 26, 2017 11:48 PM
> 
> >>> On 26.01.17 at 15:50, <dwmw2@infradead.org> wrote:
> > From: David Woodhouse <dwmw@amazon.com>
> >
> > For some MMIO regions, such as those high above RAM, mfn_valid() will
> > return false.
> >
> > Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
> > consistent cachability of MMIO mappings"), guests have no longer been
> > able to use PAT to obtain write-combining on such regions because the
> > 'ignore PAT' bit is set in EPT.
> >
> > We probably want to err on the side of caution and preserve that
> > behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
> > mappings. That necessitates a slight refactoring to check mfn_valid()
> > later, and let the MMIO case get through to the right code path.
> >
> > Since we're not bailing out for !mfn_valid() immediately, the range
> > checks need to be adjusted to cope — simply by masking in the low bits
> > to account for 'order' instead of adding, to avoid overflow when the mfn
> > is INVALID_MFN (which happens on unmap, since we carefully call this
> > function to fill in the EMT even though the PTE won't be valid).
> >
> > The range checks are also slightly refactored to put only one of them in
> > the fast path in the common case. If it doesn't overlap, then it
> > *definitely* isn't contained, so we don't need both checks. And if it
> > overlaps and is only one page, then it definitely *is* contained.
> >
> > Finally, add a comment clarifying how that 'return -1' works — it isn't
> > returning an error and causing the mapping to fail; it relies on
> > resolve_misconfig() being able to split the mapping later. So it's
> > *only* sane to do it where order>0 and the 'problem' will be solved by
> > splitting the large page. Not for blindly returning 'error', which I was
> > tempted to do in my first attempt.
> >
> > Signed-off-by: David Woodhouse <dwmw@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> But before committing I'd prefer to have a VMX maintainer ack
> too.
> 

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-02-06 11:33             ` David Woodhouse
@ 2017-02-07  5:08               ` Tian, Kevin
  2017-04-14  7:51               ` Tian, Kevin
  1 sibling, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2017-02-07  5:08 UTC (permalink / raw)
  To: David Woodhouse, Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Andrew Cooper, Anvin, H Peter, Nakajima, Jun, xen-devel

> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Monday, February 06, 2017 7:33 PM
> 
> On Fri, 2017-01-27 at 10:36 -0500, Konrad Rzeszutek Wilk wrote:
> > . snip ..
> > >
> > > >
> > > > Signed-off-by: David Woodhouse <dwmw@amazon.com>
> > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > >
> > > But before committing I'd prefer to have a VMX maintainer ack
> > > too.
> > Who is gone until Feb yth (Spring festival in China).
> 
> Not sure what number that 'y' was supposed to be, but I see Kevin
> posting today... :)

'y' is '6' for me. Lots of things piled to catch up on my first day back.
Thanks for drawing my attention here.

> 
> Would also be very good to have a definitive answer about the safety of
> mixing WC and UC on MMIO mappings on Intel hardware.

I didn't quite recall previous conversation - my memory stays with
above answer. But anyway let me ping our hardware guys...

Thanks
Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
  2017-01-26 15:48         ` Jan Beulich
@ 2017-02-08 16:04         ` David Woodhouse
  1 sibling, 0 replies; 21+ messages in thread
From: David Woodhouse @ 2017-02-08 16:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, H. Peter Anvin, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 454 bytes --]

On Thu, 2017-01-26 at 14:50 +0000, David Woodhouse wrote:
> 
> Finally, add a comment clarifying how that 'return -1' works — it isn't
> returning an error and causing the mapping to fail; it relies on…

Btw, something odd happened when committing this. That U+2014 emdash in
the first (cited) line above got turned into a U+0097 'END OF GUARDED
AREA' character. I've no idea why — just using 'git am' should have
worked perfectly normally.

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 4938 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] x86/ept: Allow write-combining on !mfn_valid() MMIO mappings again
  2017-02-06 11:33             ` David Woodhouse
  2017-02-07  5:08               ` Tian, Kevin
@ 2017-04-14  7:51               ` Tian, Kevin
  1 sibling, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2017-04-14  7:51 UTC (permalink / raw)
  To: David Woodhouse, Konrad Rzeszutek Wilk, Jan Beulich
  Cc: Andrew Cooper, Anvin, H Peter, Nakajima, Jun, xen-devel

> From: David Woodhouse [mailto:dwmw2@infradead.org]
> Sent: Monday, February 6, 2017 7:33 PM
> 
> On Fri, 2017-01-27 at 10:36 -0500, Konrad Rzeszutek Wilk wrote:
> > . snip ..
> > >
> > > >
> > > > Signed-off-by: David Woodhouse <dwmw@amazon.com>
> > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > >
> > > But before committing I'd prefer to have a VMX maintainer ack
> > > too.
> > Who is gone until Feb yth (Spring festival in China).
> 
> Not sure what number that 'y' was supposed to be, but I see Kevin
> posting today... :)
> 
> Would also be very good to have a definitive answer about the safety of
> mixing WC and UC on MMIO mappings on Intel hardware.

answer is yes, it's safe.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-04-14  7:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-17 12:28 Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings Xen.org security team
2017-01-25 14:08 ` David Woodhouse
2017-01-25 14:21   ` Jan Beulich
2017-01-25 14:34     ` David Woodhouse
2017-01-25 16:08     ` David Woodhouse
2017-01-26  8:57     ` [PATCH] x86: Allow write-combining on MMIO mappings again David Woodhouse
2017-01-26 10:45       ` Jan Beulich
2017-01-26 10:55         ` David Woodhouse
2017-01-26 11:32           ` Jan Beulich
2017-01-26 12:39         ` [PATCH v2] x86/ept: Allow write-combining on !mfn_valid() " David Woodhouse
2017-01-26 14:35           ` Jan Beulich
2017-01-26 14:42             ` David Woodhouse
2017-01-26 14:50       ` [PATCH v3] " David Woodhouse
2017-01-26 15:48         ` Jan Beulich
2017-01-27 15:36           ` Konrad Rzeszutek Wilk
2017-02-06 11:33             ` David Woodhouse
2017-02-07  5:08               ` Tian, Kevin
2017-04-14  7:51               ` Tian, Kevin
2017-02-07  5:05           ` Tian, Kevin
2017-02-08 16:04         ` David Woodhouse
2017-02-01 20:23     ` Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.