All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/5] Make PAT handling less brittle
@ 2022-12-22 22:31 Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 1/5] x86/mm: Avoid hard-coding PAT in get_page_from_l1e() Demi Marie Obenour
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

While working on Qubes OS Marek found out that there were some PAT hacks
in the Linux i195 driver.  I decided to make Xen use Linux’s PAT to see
if it solved the graphics glitches that were observed; it did.  This
required a substantial amount of preliminary work that is useful even
without using Linux’s PAT.

Patches 1 through 4 are the preliminary work and I would like them to be
accepted into upstream Xen.  Patch 4 does break ABI by rejecting the
unused PAT entries, but this will only impact buggy PV guests and can be
disabled with a Xen command-line option.  Patch 5 actually switches to
Linux’s PAT and is NOT intended to be merged (at least for now) as it
would at a minimum break migration of PV guests from hosts that do not
have the patch.

Only patches 4 and 5 actually change Xen’s observable behavior.
Patch 1 is a prerequisite that removes the last place where Xen
hard-codes its current PAT value.  Patch 2 is strictly cleanup.  Patch 3
makes changing the PAT much less error-prone, as problems with the PAT
or with the associated _PAGE_* constants will be detected at compile
time.

Demi Marie Obenour (5):
  x86/mm: Avoid hard-coding PAT in get_page_from_l1e()
  x86: Remove MEMORY_NUM_TYPES and NO_HARDCODE_MEM_TYPE
  x86/mm: make code robust to future PAT changes
  x86/mm: Reject invalid cacheability in PV guests by default
  x86: Use Linux's PAT

 docs/misc/xen-command-line.pandoc    |  11 ++
 xen/arch/x86/hvm/mtrr.c              |  18 ++--
 xen/arch/x86/include/asm/mtrr.h      |   2 -
 xen/arch/x86/include/asm/page.h      |   4 +-
 xen/arch/x86/include/asm/processor.h |  10 +-
 xen/arch/x86/mm.c                    | 146 ++++++++++++++++++++++++---
 xen/arch/x86/mm/shadow/multi.c       |   2 +-
 7 files changed, 162 insertions(+), 31 deletions(-)

-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v6 1/5] x86/mm: Avoid hard-coding PAT in get_page_from_l1e()
  2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
@ 2022-12-22 22:31 ` Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 2/5] x86: Remove MEMORY_NUM_TYPES and NO_HARDCODE_MEM_TYPE Demi Marie Obenour
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

get_page_from_l1e() relied on Xen's choice of PAT, which is brittle in
the face of future PAT changes.  Instead, compute the actual cacheability
used by the CPU and switch on that, as this will work no matter what PAT
Xen uses.

No functional change intended.  This code is itself questionable and may
be removed in the future, but removing it would be an observable
behavior change and so is out of scope for this patch series.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v5:
- Make comment in get_page_from_l1e() future-proof.
- Explicitly state how known-uncacheable and potentially-cacheable types
  are handled.

Changes since v4:
- Do not add new pte_flags_to_cacheability() helper, as this code may be
  removed in the near future and so adding a new helper for it is a bad
  idea.
- Do not BUG() in the event of an unexpected cacheability.  This cannot
  happen, but it is simpler to force such types to UC than to prove that
  the BUG() is not reachable.

Changes since v3:
- Compute and use the actual cacheability as seen by the processor.

Changes since v2:
- Improve commit message.
---
 xen/arch/x86/mm.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 1bda1ba697b434b6c884f17e599aa9b6d3b3dd76..3558ca215b02a517d55d75329d645ae5905424e4 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -959,14 +959,16 @@ get_page_from_l1e(
             flip = _PAGE_RW;
         }
 
-        switch ( l1f & PAGE_CACHE_ATTRS )
+        switch ( 0xFF & (XEN_MSR_PAT >> (8 * pte_flags_to_cacheattr(l1f))) )
         {
-        case 0: /* WB */
-            flip |= _PAGE_PWT | _PAGE_PCD;
+        case X86_MT_UC:
+        case X86_MT_UCM:
+        case X86_MT_WC:
+            /* not cacheable, allow */
             break;
-        case _PAGE_PWT: /* WT */
-        case _PAGE_PWT | _PAGE_PAT: /* WP */
-            flip |= _PAGE_PCD | (l1f & _PAGE_PAT);
+        default:
+            /* potentially cacheable, force to UC */
+            flip |= ((l1f & PAGE_CACHE_ATTRS) ^ _PAGE_UC);
             break;
         }
 
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 2/5] x86: Remove MEMORY_NUM_TYPES and NO_HARDCODE_MEM_TYPE
  2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 1/5] x86/mm: Avoid hard-coding PAT in get_page_from_l1e() Demi Marie Obenour
@ 2022-12-22 22:31 ` Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 3/5] x86/mm: make code robust to future PAT changes Demi Marie Obenour
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

No functional change intended.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
Changes since v2:

- Keep MTRR_NUM_TYPES and adjust commit message accordingly
---
 xen/arch/x86/hvm/mtrr.c         | 18 +++++++++---------
 xen/arch/x86/include/asm/mtrr.h |  2 --
 xen/arch/x86/mm/shadow/multi.c  |  2 +-
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 093103f6c768cf64f880d1b20e1c14f5918c1250..05e978041d62fd0d559462de181a04bef8a5bca9 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -38,7 +38,7 @@ static const uint8_t pat_entry_2_pte_flags[8] = {
 
 /* Effective mm type lookup table, according to MTRR and PAT. */
 static const uint8_t mm_type_tbl[MTRR_NUM_TYPES][X86_NUM_MT] = {
-#define RS MEMORY_NUM_TYPES
+#define RS MTRR_NUM_TYPES
 #define UC X86_MT_UC
 #define WB X86_MT_WB
 #define WC X86_MT_WC
@@ -66,9 +66,9 @@ static const uint8_t mm_type_tbl[MTRR_NUM_TYPES][X86_NUM_MT] = {
  * Reverse lookup table, to find a pat type according to MTRR and effective
  * memory type. This table is dynamically generated.
  */
-static uint8_t __read_mostly mtrr_epat_tbl[MTRR_NUM_TYPES][MEMORY_NUM_TYPES] =
-    { [0 ... MTRR_NUM_TYPES-1] =
-        { [0 ... MEMORY_NUM_TYPES-1] = INVALID_MEM_TYPE }
+static uint8_t __read_mostly mtrr_epat_tbl[MTRR_NUM_TYPES][MTRR_NUM_TYPES] =
+    { [0 ... MTRR_NUM_TYPES - 1] =
+        { [0 ... MTRR_NUM_TYPES - 1] = INVALID_MEM_TYPE }
     };
 
 /* Lookup table for PAT entry of a given PAT value in host PAT. */
@@ -85,7 +85,7 @@ static int __init cf_check hvm_mtrr_pat_init(void)
         {
             unsigned int tmp = mm_type_tbl[i][j];
 
-            if ( tmp < MEMORY_NUM_TYPES )
+            if ( tmp < MTRR_NUM_TYPES )
                 mtrr_epat_tbl[i][tmp] = j;
         }
     }
@@ -317,11 +317,11 @@ static uint8_t effective_mm_type(struct mtrr_state *m,
                                  uint8_t gmtrr_mtype)
 {
     uint8_t mtrr_mtype, pat_value;
-   
+
     /* if get_pat_flags() gives a dedicated MTRR type,
      * just use it
-     */ 
-    if ( gmtrr_mtype == NO_HARDCODE_MEM_TYPE )
+     */
+    if ( gmtrr_mtype == MTRR_NUM_TYPES )
         mtrr_mtype = mtrr_get_type(m, gpa, 0);
     else
         mtrr_mtype = gmtrr_mtype;
@@ -346,7 +346,7 @@ uint32_t get_pat_flags(struct vcpu *v,
     /* 1. Get the effective memory type of guest physical address,
      * with the pair of guest MTRR and PAT
      */
-    guest_eff_mm_type = effective_mm_type(g, pat, gpaddr, 
+    guest_eff_mm_type = effective_mm_type(g, pat, gpaddr,
                                           gl1e_flags, gmtrr_mtype);
     /* 2. Get the memory type of host physical address, with MTRR */
     shadow_mtrr_type = mtrr_get_type(&mtrr_state, spaddr, 0);
diff --git a/xen/arch/x86/include/asm/mtrr.h b/xen/arch/x86/include/asm/mtrr.h
index e4f6ca6048334b2094a1836cc2f298453641232f..4b7f840a965954cc4b59698327a37e47026893a4 100644
--- a/xen/arch/x86/include/asm/mtrr.h
+++ b/xen/arch/x86/include/asm/mtrr.h
@@ -4,8 +4,6 @@
 #include <xen/mm.h>
 
 #define MTRR_NUM_TYPES       X86_MT_UCM
-#define MEMORY_NUM_TYPES     MTRR_NUM_TYPES
-#define NO_HARDCODE_MEM_TYPE MTRR_NUM_TYPES
 
 #define NORMAL_CACHE_MODE          0
 #define NO_FILL_CACHE_MODE         2
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index f5f7ff021bd9e057c5b6f6329de7acb5ef05d58f..1faf9940db6b0afefc5977c00c00fb6a39cd27d2 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -578,7 +578,7 @@ _sh_propagate(struct vcpu *v,
                             gflags,
                             gfn_to_paddr(target_gfn),
                             mfn_to_maddr(target_mfn),
-                            NO_HARDCODE_MEM_TYPE);
+                            MTRR_NUM_TYPES);
             }
     }
 
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 3/5] x86/mm: make code robust to future PAT changes
  2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 1/5] x86/mm: Avoid hard-coding PAT in get_page_from_l1e() Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 2/5] x86: Remove MEMORY_NUM_TYPES and NO_HARDCODE_MEM_TYPE Demi Marie Obenour
@ 2022-12-22 22:31 ` Demi Marie Obenour
  2023-01-05 14:01   ` Jan Beulich
  2022-12-22 22:31 ` [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default Demi Marie Obenour
  2022-12-22 22:31 ` [PATCH v6 5/5] x86: Use Linux's PAT Demi Marie Obenour
  4 siblings, 1 reply; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

It may be desirable to change Xen's PAT for various reasons.  This
requires changes to several _PAGE_* macros as well.  Add static
assertions to check that XEN_MSR_PAT is consistent with the _PAGE_*
macros, and that _PAGE_WB is 0 as required by Linux.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
Changes since v5:
- Remove unhelpful comment.
- Move a BUILD_BUG_ON.
- Use fewer hardcoded constants in PTE_FLAGS_TO_CACHEATTR.
- Fix coding style.

Changes since v4:
- Add lots of comments explaining what the various BUILD_BUG_ON()s mean.

Changes since v3:
- Refactor some macros
- Avoid including a string literal in BUILD_BUG_ON
---
 xen/arch/x86/mm.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 70 insertions(+)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 3558ca215b02a517d55d75329d645ae5905424e4..65ba0f58ed8c26ac0343528303851739981c03bd 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6352,6 +6352,11 @@ unsigned long get_upper_mfn_bound(void)
     return min(max_mfn, 1UL << (paddr_bits - PAGE_SHIFT)) - 1;
 }
 
+
+/*
+ * A bunch of static assertions to check that the XEN_MSR_PAT is valid
+ * and consistent with the _PAGE_* macros, and that _PAGE_WB is zero.
+ */
 static void __init __maybe_unused build_assertions(void)
 {
     /*
@@ -6361,6 +6366,71 @@ static void __init __maybe_unused build_assertions(void)
      * using different PATs will not work.
      */
     BUILD_BUG_ON(XEN_MSR_PAT != 0x050100070406ULL);
+
+    /*
+     * _PAGE_WB must be zero for several reasons, not least because Linux
+     * PV guests assume it.
+     */
+    BUILD_BUG_ON(_PAGE_WB);
+
+#define PAT_ENTRY(v)                                                           \
+    (BUILD_BUG_ON_ZERO(((v) < 0) || ((v) > 7)) +                               \
+     (0xFF & (XEN_MSR_PAT >> (8 * (v)))))
+
+    /* Validate at compile-time that v is a valid value for a PAT entry */
+#define CHECK_PAT_ENTRY_VALUE(v)                                               \
+    BUILD_BUG_ON((v) < 0 || (v) > 7 ||                                         \
+                 (v) == X86_MT_RSVD_2 || (v) == X86_MT_RSVD_3)
+
+    /* Validate at compile-time that PAT entry v is valid */
+#define CHECK_PAT_ENTRY(v) CHECK_PAT_ENTRY_VALUE(PAT_ENTRY(v))
+
+    /*
+     * If one of these trips, the corresponding entry in XEN_MSR_PAT is invalid.
+     * This would cause Xen to crash (with #GP) at startup.
+     */
+    CHECK_PAT_ENTRY(0);
+    CHECK_PAT_ENTRY(1);
+    CHECK_PAT_ENTRY(2);
+    CHECK_PAT_ENTRY(3);
+    CHECK_PAT_ENTRY(4);
+    CHECK_PAT_ENTRY(5);
+    CHECK_PAT_ENTRY(6);
+    CHECK_PAT_ENTRY(7);
+
+#undef CHECK_PAT_ENTRY
+#undef CHECK_PAT_ENTRY_VALUE
+
+    /* Macro version of pte_flags_to_cacheattr(), for use in BUILD_BUG_ON()s */
+#define PTE_FLAGS_TO_CACHEATTR(pte_value)                                      \
+    ((((pte_value) & _PAGE_PAT) >> 5) |                                        \
+     (((pte_value) & (_PAGE_PCD | _PAGE_PWT)) >> 3))
+
+    /* Check that a PAT-related _PAGE_* macro is correct */
+#define CHECK_PAGE_VALUE(page_value) do {                                      \
+    /* Check that the _PAGE_* macros only use bits from PAGE_CACHE_ATTRS */    \
+    BUILD_BUG_ON(((_PAGE_ ## page_value) & PAGE_CACHE_ATTRS) !=                \
+                 (_PAGE_ ## page_value));                                      \
+    /* Check that the _PAGE_* are consistent with XEN_MSR_PAT */               \
+    BUILD_BUG_ON(PAT_ENTRY(PTE_FLAGS_TO_CACHEATTR(_PAGE_ ## page_value)) !=    \
+                 (X86_MT_ ## page_value));                                     \
+} while ( false )
+
+    /*
+     * If one of these trips, the corresponding _PAGE_* macro is inconsistent
+     * with XEN_MSR_PAT.  This would cause Xen to use incorrect cacheability
+     * flags, with results that are unknown and possibly harmful.
+     */
+    CHECK_PAGE_VALUE(WT);
+    CHECK_PAGE_VALUE(WB);
+    CHECK_PAGE_VALUE(WC);
+    CHECK_PAGE_VALUE(UC);
+    CHECK_PAGE_VALUE(UCM);
+    CHECK_PAGE_VALUE(WP);
+
+#undef CHECK_PAGE_VALUE
+#undef PAGE_FLAGS_TO_CACHEATTR
+#undef PAT_ENTRY
 }
 
 /*
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
                   ` (2 preceding siblings ...)
  2022-12-22 22:31 ` [PATCH v6 3/5] x86/mm: make code robust to future PAT changes Demi Marie Obenour
@ 2022-12-22 22:31 ` Demi Marie Obenour
  2023-01-05 14:30   ` Jan Beulich
  2023-01-05 20:28   ` Andrew Cooper
  2022-12-22 22:31 ` [PATCH v6 5/5] x86: Use Linux's PAT Demi Marie Obenour
  4 siblings, 2 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

Setting cacheability flags that are not ones specified by Xen is a bug
in the guest.  By default, return -EINVAL if a guests attempts to do
this.  The invalid-cacheability= Xen command-line flag allows the
administrator to allow such attempts or to produce

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
---
Changes since v5:
- Make parameters static and __ro_after_init.
- Replace boolean parameter allow_invalid_cacheability with string
  parameter invalid-cacheability.
- Move parameter definitions to near where they are used.
- Add documentation.

Changes since v4:
- Remove pointless BUILD_BUG_ON().
- Add comment explaining why an exception is being injected.

Changes since v3:
- Add Andrew Cooper’s Suggested-by
---
 docs/misc/xen-command-line.pandoc | 11 ++++++
 xen/arch/x86/mm.c                 | 60 ++++++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
 ### idle_latency_factor (x86)
 > `= <integer>`
 
+### invalid-cacheability (x86)
+> `= allow | deny | trap`
+
+> Default: `deny` in release builds, otherwise `trap`
+
+Specify what happens when a PV guest tries to use one of the reserved entries in
+the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
+the attempt, and `trap` causes a general protection fault to be raised.
+Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
+will change if new memory types are added, so guests must not rely on it.
+
 ### ioapic_ack (x86)
 > `= old | new`
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 65ba0f58ed8c26ac0343528303851739981c03bd..bacfb776d688f68dcbf79d83723fff329b75fd18 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1324,6 +1324,37 @@ static int put_page_from_l4e(l4_pgentry_t l4e, mfn_t l4mfn, unsigned int flags)
     return put_pt_page(l4e_get_page(l4e), mfn_to_page(l4mfn), flags);
 }
 
+enum {
+    INVALID_CACHEABILITY_ALLOW,
+    INVALID_CACHEABILITY_DENY,
+    INVALID_CACHEABILITY_TRAP,
+};
+
+#ifdef NDEBUG
+#define INVALID_CACHEABILITY_DEFAULT INVALID_CACHEABILITY_DENY
+#else
+#define INVALID_CACHEABILITY_DEFAULT INVALID_CACHEABILITY_TRAP
+#endif
+
+static __ro_after_init uint8_t invalid_cacheability =
+    INVALID_CACHEABILITY_DEFAULT;
+
+static int __init cf_check set_invalid_cacheability(const char *str)
+{
+    if (strcmp("allow", str) == 0)
+        invalid_cacheability = INVALID_CACHEABILITY_ALLOW;
+    else if (strcmp("deny", str) == 0)
+        invalid_cacheability = INVALID_CACHEABILITY_DENY;
+    else if (strcmp("trap", str) == 0)
+        invalid_cacheability = INVALID_CACHEABILITY_TRAP;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+
+custom_param("invalid-cacheability", set_invalid_cacheability);
+
 static int promote_l1_table(struct page_info *page)
 {
     struct domain *d = page_get_owner(page);
@@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
         }
         else
         {
-            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
+            l1_pgentry_t l1e = pl1e[i];
+
+            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
+            {
+                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
+                {
+                case _PAGE_WB:
+                case _PAGE_UC:
+                case _PAGE_UCM:
+                case _PAGE_WC:
+                case _PAGE_WT:
+                case _PAGE_WP:
+                    break;
+                default:
+                    /*
+                     * If we get here, a PV guest tried to use one of the
+                     * reserved values in Xen's PAT.  This indicates a bug
+                     * in the guest.  If requested by the user, inject #GP
+                     * to cause the guest to log a stack trace.
+                     */
+                    if ( invalid_cacheability == INVALID_CACHEABILITY_TRAP )
+                        pv_inject_hw_exception(TRAP_gp_fault, 0);
+                    ret = -EINVAL;
+                    goto fail;
+                }
+            }
+
+            switch ( ret = get_page_from_l1e(l1e, d, d) )
             {
             default:
                 goto fail;
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v6 5/5] x86: Use Linux's PAT
  2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
                   ` (3 preceding siblings ...)
  2022-12-22 22:31 ` [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default Demi Marie Obenour
@ 2022-12-22 22:31 ` Demi Marie Obenour
  4 siblings, 0 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2022-12-22 22:31 UTC (permalink / raw)
  To: xen-devel
  Cc: Demi Marie Obenour, Marek Marczykowski-Górecki, Jan Beulich,
	Andrew Cooper, Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap

This is purely for testing, to see if it works around a bug in i915.  It
is not intended to be merged.

NOT-signed-off-by: DO NOT MERGE
---
 xen/arch/x86/include/asm/page.h      |  4 ++--
 xen/arch/x86/include/asm/processor.h | 10 +++++-----
 xen/arch/x86/mm.c                    |  8 --------
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index b585235d064a567082582c8e92a4e8283fd949ca..ab9b46f1d0901e50a83fd035ff28d1bda0b781a2 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -333,11 +333,11 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
 
 /* Memory types, encoded under Xen's choice of MSR_PAT. */
 #define _PAGE_WB         (                                0)
-#define _PAGE_WT         (                        _PAGE_PWT)
+#define _PAGE_WC         (                        _PAGE_PWT)
 #define _PAGE_UCM        (            _PAGE_PCD            )
 #define _PAGE_UC         (            _PAGE_PCD | _PAGE_PWT)
-#define _PAGE_WC         (_PAGE_PAT                        )
 #define _PAGE_WP         (_PAGE_PAT |             _PAGE_PWT)
+#define _PAGE_WT         (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
 
 /*
  * Debug option: Ensure that granted mappings are not implicitly unmapped.
diff --git a/xen/arch/x86/include/asm/processor.h b/xen/arch/x86/include/asm/processor.h
index 60b902060914584957db8afa5c7c1e6abdad4d13..3993d5638626f0948bb7ac8192d2eda187eb1bdb 100644
--- a/xen/arch/x86/include/asm/processor.h
+++ b/xen/arch/x86/include/asm/processor.h
@@ -94,16 +94,16 @@
 
 /*
  * Host IA32_CR_PAT value to cover all memory types.  This is not the default
- * MSR_PAT value, and is an ABI with PV guests.
+ * MSR_PAT value, and is needed by the Linux i915 driver.
  */
 #define XEN_MSR_PAT ((_AC(X86_MT_WB,  ULL) << 0x00) | \
-                     (_AC(X86_MT_WT,  ULL) << 0x08) | \
+                     (_AC(X86_MT_WC,  ULL) << 0x08) | \
                      (_AC(X86_MT_UCM, ULL) << 0x10) | \
                      (_AC(X86_MT_UC,  ULL) << 0x18) | \
-                     (_AC(X86_MT_WC,  ULL) << 0x20) | \
+                     (_AC(X86_MT_WB,  ULL) << 0x20) | \
                      (_AC(X86_MT_WP,  ULL) << 0x28) | \
-                     (_AC(X86_MT_UC,  ULL) << 0x30) | \
-                     (_AC(X86_MT_UC,  ULL) << 0x38))
+                     (_AC(X86_MT_UCM, ULL) << 0x30) | \
+                     (_AC(X86_MT_WT,  ULL) << 0x38))
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index bacfb776d688f68dcbf79d83723fff329b75fd18..ea8c69e66c48419031add2e02da0a8eb6af8e49a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -6417,14 +6417,6 @@ unsigned long get_upper_mfn_bound(void)
  */
 static void __init __maybe_unused build_assertions(void)
 {
-    /*
-     * If this trips, any guests that blindly rely on the public API in xen.h
-     * (instead of reading the PAT from Xen, as Linux 3.19+ does) will be
-     * broken.  Furthermore, live migration of PV guests between Xen versions
-     * using different PATs will not work.
-     */
-    BUILD_BUG_ON(XEN_MSR_PAT != 0x050100070406ULL);
-
     /*
      * _PAGE_WB must be zero for several reasons, not least because Linux
      * PV guests assume it.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 3/5] x86/mm: make code robust to future PAT changes
  2022-12-22 22:31 ` [PATCH v6 3/5] x86/mm: make code robust to future PAT changes Demi Marie Obenour
@ 2023-01-05 14:01   ` Jan Beulich
  2023-01-05 19:58     ` Andrew Cooper
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2023-01-05 14:01 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Marek Marczykowski-Górecki, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap, xen-devel

On 22.12.2022 23:31, Demi Marie Obenour wrote:
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -6352,6 +6352,11 @@ unsigned long get_upper_mfn_bound(void)
>      return min(max_mfn, 1UL << (paddr_bits - PAGE_SHIFT)) - 1;
>  }
>  
> +
> +/*

Nit: Please avoid introducing double blank lines.

> + * A bunch of static assertions to check that the XEN_MSR_PAT is valid
> + * and consistent with the _PAGE_* macros, and that _PAGE_WB is zero.

This comment is too specific for a function of ...

> + */
>  static void __init __maybe_unused build_assertions(void)

... this name, in this file.

> @@ -6361,6 +6366,71 @@ static void __init __maybe_unused build_assertions(void)
>       * using different PATs will not work.
>       */
>      BUILD_BUG_ON(XEN_MSR_PAT != 0x050100070406ULL);
> +
> +    /*
> +     * _PAGE_WB must be zero for several reasons, not least because Linux
> +     * PV guests assume it.
> +     */
> +    BUILD_BUG_ON(_PAGE_WB);
> +
> +#define PAT_ENTRY(v)                                                           \
> +    (BUILD_BUG_ON_ZERO(((v) < 0) || ((v) > 7)) +                               \
> +     (0xFF & (XEN_MSR_PAT >> (8 * (v)))))
> +
> +    /* Validate at compile-time that v is a valid value for a PAT entry */
> +#define CHECK_PAT_ENTRY_VALUE(v)                                               \
> +    BUILD_BUG_ON((v) < 0 || (v) > 7 ||                                         \

See my v5 comments.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2022-12-22 22:31 ` [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default Demi Marie Obenour
@ 2023-01-05 14:30   ` Jan Beulich
  2023-01-05 20:28   ` Andrew Cooper
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2023-01-05 14:30 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Marek Marczykowski-Górecki, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Tim Deegan, George Dunlap, xen-devel

On 22.12.2022 23:31, Demi Marie Obenour wrote:
> Setting cacheability flags that are not ones specified by Xen is a bug
> in the guest.  By default, return -EINVAL if a guests attempts to do
> this.  The invalid-cacheability= Xen command-line flag allows the
> administrator to allow such attempts or to produce

Unfinished sentence?

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -1324,6 +1324,37 @@ static int put_page_from_l4e(l4_pgentry_t l4e, mfn_t l4mfn, unsigned int flags)
>      return put_pt_page(l4e_get_page(l4e), mfn_to_page(l4mfn), flags);
>  }
>  
> +enum {
> +    INVALID_CACHEABILITY_ALLOW,
> +    INVALID_CACHEABILITY_DENY,
> +    INVALID_CACHEABILITY_TRAP,
> +};
> +
> +#ifdef NDEBUG
> +#define INVALID_CACHEABILITY_DEFAULT INVALID_CACHEABILITY_DENY
> +#else
> +#define INVALID_CACHEABILITY_DEFAULT INVALID_CACHEABILITY_TRAP
> +#endif
> +
> +static __ro_after_init uint8_t invalid_cacheability =
> +    INVALID_CACHEABILITY_DEFAULT;
> +
> +static int __init cf_check set_invalid_cacheability(const char *str)
> +{
> +    if (strcmp("allow", str) == 0)
> +        invalid_cacheability = INVALID_CACHEABILITY_ALLOW;
> +    else if (strcmp("deny", str) == 0)
> +        invalid_cacheability = INVALID_CACHEABILITY_DENY;
> +    else if (strcmp("trap", str) == 0)
> +        invalid_cacheability = INVALID_CACHEABILITY_TRAP;

Style: Missing blanks immediately inside if(). Also note that generally
we prefer '!' over "== 0".

> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> +
> +custom_param("invalid-cacheability", set_invalid_cacheability);

Nit: Generally we avoid blank lines between the handler of a
custom_param() and the actual param definition.

> @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
>          }
>          else
>          {
> -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> +            l1_pgentry_t l1e = pl1e[i];
> +
> +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> +            {
> +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )

You want to use l1e_get_flags() here.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 3/5] x86/mm: make code robust to future PAT changes
  2023-01-05 14:01   ` Jan Beulich
@ 2023-01-05 19:58     ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2023-01-05 19:58 UTC (permalink / raw)
  To: Jan Beulich, Demi Marie Obenour
  Cc: Marek Marczykowski-Górecki, Roger Pau Monne, Wei Liu,
	Tim (Xen.org),
	George Dunlap, xen-devel

On 05/01/2023 2:01 pm, Jan Beulich wrote:
> On 22.12.2022 23:31, Demi Marie Obenour wrote:
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -6352,6 +6352,11 @@ unsigned long get_upper_mfn_bound(void)
>>      return min(max_mfn, 1UL << (paddr_bits - PAGE_SHIFT)) - 1;
>>  }
>>  
>> +
>> +/*
> Nit: Please avoid introducing double blank lines.
>
>> + * A bunch of static assertions to check that the XEN_MSR_PAT is valid
>> + * and consistent with the _PAGE_* macros, and that _PAGE_WB is zero.
> This comment is too specific for a function of ...
>
>> + */
>>  static void __init __maybe_unused build_assertions(void)
> ... this name, in this file.

IMO, you really don't need to comment build_assertions().  It's a
pattern we use elsewhere, and the BUILD_BUG_ON()'s are individually
commented.

I'd just drop this hunk entirely.

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2022-12-22 22:31 ` [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default Demi Marie Obenour
  2023-01-05 14:30   ` Jan Beulich
@ 2023-01-05 20:28   ` Andrew Cooper
  2023-01-06  2:00     ` Demi Marie Obenour
  2023-01-06  7:11     ` Jan Beulich
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Cooper @ 2023-01-05 20:28 UTC (permalink / raw)
  To: Demi Marie Obenour, xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Roger Pau Monne,
	Wei Liu, Tim (Xen.org),
	George Dunlap

On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
>  ### idle_latency_factor (x86)
>  > `= <integer>`
>  
> +### invalid-cacheability (x86)
> +> `= allow | deny | trap`
> +
> +> Default: `deny` in release builds, otherwise `trap`
> +
> +Specify what happens when a PV guest tries to use one of the reserved entries in
> +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
> +the attempt, and `trap` causes a general protection fault to be raised.
> +Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
> +will change if new memory types are added, so guests must not rely on it.
> +

This wants to be scoped under "pv", and not a top-level option.  Current
parsing is at the top of xen/arch/x86/pv/domain.c

How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean ?

There really is no need to distinguish between deny and trap.  IMO,
injecting #GP unilaterally is fine (to a guest expecting the hypercall
to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
useful to a human trying to debug issues here), but I could be talked
into putting that behind a CONFIG_DEBUG if other feel strongly.

> @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
>          }
>          else
>          {
> -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> +            l1_pgentry_t l1e = pl1e[i];
> +
> +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> +            {
> +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
> +                {
> +                case _PAGE_WB:
> +                case _PAGE_UC:
> +                case _PAGE_UCM:
> +                case _PAGE_WC:
> +                case _PAGE_WT:
> +                case _PAGE_WP:
> +                    break;
> +                default:
> +                    /*
> +                     * If we get here, a PV guest tried to use one of the
> +                     * reserved values in Xen's PAT.  This indicates a bug
> +                     * in the guest.  If requested by the user, inject #GP
> +                     * to cause the guest to log a stack trace.
> +                     */
> +                    if ( invalid_cacheability == INVALID_CACHEABILITY_TRAP )
> +                        pv_inject_hw_exception(TRAP_gp_fault, 0);
> +                    ret = -EINVAL;
> +                    goto fail;
> +                }
> +            }

This will catch cases where the guest writes a full pagetable, then
promotes it, but won't catch cases where the guest modifies an
individual entry with mmu_update/etc.

The logic needs to be in get_page_from_l1e() to get applied uniformly to
all PTE modifications.

Right now, the l1_disallow_mask() check near the start hides the "can
you use a nonzero cacheattr" check.  If I ever get around to cleaning up
my post-XSA-402 series, I have a load of modifications to this.

But this could be something like this:

if ( opt_pv_oob_pat && (l1f & PAGE_CACHE_ATTRS) > _PAGE_WP )
{
    // #GP
    return -EINVAL;
}

fairly early on.

It occurs to me that this check is only applicable when we're using the
Xen ABI APT, and seeing as we've talked about possibly making patch 5 a
Kconfig option, that may want accounting here.  (Perhaps simply making
opt_pb_oob_pat be false in a !XEN_PAT build.)

~Andrew

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2023-01-05 20:28   ` Andrew Cooper
@ 2023-01-06  2:00     ` Demi Marie Obenour
  2023-01-06  2:30       ` Marek Marczykowski-Górecki
  2023-01-06  7:11     ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Demi Marie Obenour @ 2023-01-06  2:00 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Roger Pau Monne,
	Wei Liu, Tim (Xen.org),
	George Dunlap

[-- Attachment #1: Type: text/plain, Size: 4781 bytes --]

On Thu, Jan 05, 2023 at 08:28:26PM +0000, Andrew Cooper wrote:
> On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
> > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
> >  ### idle_latency_factor (x86)
> >  > `= <integer>`
> >  
> > +### invalid-cacheability (x86)
> > +> `= allow | deny | trap`
> > +
> > +> Default: `deny` in release builds, otherwise `trap`
> > +
> > +Specify what happens when a PV guest tries to use one of the reserved entries in
> > +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
> > +the attempt, and `trap` causes a general protection fault to be raised.
> > +Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
> > +will change if new memory types are added, so guests must not rely on it.
> > +
> 
> This wants to be scoped under "pv", and not a top-level option.  Current
> parsing is at the top of xen/arch/x86/pv/domain.c
> 
> How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean ?

Works for me, though I will use ‘invalid’ instead of ‘oob’ as valid PAT
entries might not be contiguous.

> There really is no need to distinguish between deny and trap.  IMO,
> injecting #GP unilaterally is fine (to a guest expecting the hypercall
> to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
> useful to a human trying to debug issues here), but I could be talked
> into putting that behind a CONFIG_DEBUG if other feel strongly.

Marek, thoughts?

> > @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
> >          }
> >          else
> >          {
> > -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> > +            l1_pgentry_t l1e = pl1e[i];
> > +
> > +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> > +            {
> > +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
> > +                {
> > +                case _PAGE_WB:
> > +                case _PAGE_UC:
> > +                case _PAGE_UCM:
> > +                case _PAGE_WC:
> > +                case _PAGE_WT:
> > +                case _PAGE_WP:
> > +                    break;
> > +                default:
> > +                    /*
> > +                     * If we get here, a PV guest tried to use one of the
> > +                     * reserved values in Xen's PAT.  This indicates a bug
> > +                     * in the guest.  If requested by the user, inject #GP
> > +                     * to cause the guest to log a stack trace.
> > +                     */
> > +                    if ( invalid_cacheability == INVALID_CACHEABILITY_TRAP )
> > +                        pv_inject_hw_exception(TRAP_gp_fault, 0);
> > +                    ret = -EINVAL;
> > +                    goto fail;
> > +                }
> > +            }
> 
> This will catch cases where the guest writes a full pagetable, then
> promotes it, but won't catch cases where the guest modifies an
> individual entry with mmu_update/etc.
> 
> The logic needs to be in get_page_from_l1e() to get applied uniformly to
> all PTE modifications.

I will move it there, and also update Qubes OS’s patchset.

> Right now, the l1_disallow_mask() check near the start hides the "can
> you use a nonzero cacheattr" check.  If I ever get around to cleaning up
> my post-XSA-402 series, I have a load of modifications to this.

I came up with some major cleanups too.

> But this could be something like this:
> 
> if ( opt_pv_oob_pat && (l1f & PAGE_CACHE_ATTRS) > _PAGE_WP )
> {
>     // #GP
>     return -EINVAL;
> }
> 
> fairly early on.
> 
> It occurs to me that this check is only applicable when we're using the
> Xen ABI APT, and seeing as we've talked about possibly making patch 5 a
> Kconfig option, that may want accounting here.  (Perhaps simply making
> opt_pb_oob_pat be false in a !XEN_PAT build.)

It’s actually applicable even with other PATs.  While Marek and I were
tracking down an Intel iGPU cache coherency problem, Marek used it to
verify that PAT entries that we thought were not being used were in fact
unused.  This allowed proving that the behavior of the GPU was impacted
by changes to PAT entries the hardware should not even be looking at,
and therefore that the hardware itself must be buggy.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2023-01-06  2:00     ` Demi Marie Obenour
@ 2023-01-06  2:30       ` Marek Marczykowski-Górecki
  2023-01-06  5:23         ` Demi Marie Obenour
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-01-06  2:30 UTC (permalink / raw)
  To: Demi Marie Obenour
  Cc: Andrew Cooper, xen-devel, Jan Beulich, Roger Pau Monne, Wei Liu,
	Tim (Xen.org),
	George Dunlap

[-- Attachment #1: Type: text/plain, Size: 6312 bytes --]

On Thu, Jan 05, 2023 at 09:00:03PM -0500, Demi Marie Obenour wrote:
> On Thu, Jan 05, 2023 at 08:28:26PM +0000, Andrew Cooper wrote:
> > On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
> > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > > index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
> > > --- a/docs/misc/xen-command-line.pandoc
> > > +++ b/docs/misc/xen-command-line.pandoc
> > > @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
> > >  ### idle_latency_factor (x86)
> > >  > `= <integer>`
> > >  
> > > +### invalid-cacheability (x86)
> > > +> `= allow | deny | trap`
> > > +
> > > +> Default: `deny` in release builds, otherwise `trap`
> > > +
> > > +Specify what happens when a PV guest tries to use one of the reserved entries in
> > > +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
> > > +the attempt, and `trap` causes a general protection fault to be raised.
> > > +Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
> > > +will change if new memory types are added, so guests must not rely on it.
> > > +
> > 
> > This wants to be scoped under "pv", and not a top-level option.  Current
> > parsing is at the top of xen/arch/x86/pv/domain.c
> > 
> > How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean ?
> 
> Works for me, though I will use ‘invalid’ instead of ‘oob’ as valid PAT
> entries might not be contiguous.

If we're talking about alternative PAT settings, I'm not sure if they
can be called "invalid". With the default Xen's choice of PAT, top two
entries are documented as reserved (in xen.h) and only that makes them
forbidden to use. But with alternative settings, it's only behavior of
Linux parsing that prefers using lower options. When breaking contract
set in xen.h, "reserved" entries stop being reserved too.

So, _if_ that option would be applicable alternative PAT choice, it's
only useful for debugging Linux specifically (assuming Linux won't
change its approach to choosing entries - which I think it's allowed to
do).

> > There really is no need to distinguish between deny and trap.  IMO,
> > injecting #GP unilaterally is fine (to a guest expecting the hypercall
> > to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
> > useful to a human trying to debug issues here), but I could be talked
> > into putting that behind a CONFIG_DEBUG if other feel strongly.
> 
> Marek, thoughts?

With Xen's default PAT, #GP may be useful indeed, but it must come with
a message why it was injected.

> > > @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
> > >          }
> > >          else
> > >          {
> > > -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> > > +            l1_pgentry_t l1e = pl1e[i];
> > > +
> > > +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> > > +            {
> > > +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
> > > +                {
> > > +                case _PAGE_WB:
> > > +                case _PAGE_UC:
> > > +                case _PAGE_UCM:
> > > +                case _PAGE_WC:
> > > +                case _PAGE_WT:
> > > +                case _PAGE_WP:
> > > +                    break;
> > > +                default:
> > > +                    /*
> > > +                     * If we get here, a PV guest tried to use one of the
> > > +                     * reserved values in Xen's PAT.  This indicates a bug
> > > +                     * in the guest.  If requested by the user, inject #GP
> > > +                     * to cause the guest to log a stack trace.
> > > +                     */
> > > +                    if ( invalid_cacheability == INVALID_CACHEABILITY_TRAP )
> > > +                        pv_inject_hw_exception(TRAP_gp_fault, 0);
> > > +                    ret = -EINVAL;
> > > +                    goto fail;
> > > +                }
> > > +            }
> > 
> > This will catch cases where the guest writes a full pagetable, then
> > promotes it, but won't catch cases where the guest modifies an
> > individual entry with mmu_update/etc.
> > 
> > The logic needs to be in get_page_from_l1e() to get applied uniformly to
> > all PTE modifications.
> 
> I will move it there, and also update Qubes OS’s patchset.
> 
> > Right now, the l1_disallow_mask() check near the start hides the "can
> > you use a nonzero cacheattr" check.  If I ever get around to cleaning up
> > my post-XSA-402 series, I have a load of modifications to this.
> 
> I came up with some major cleanups too.
> 
> > But this could be something like this:
> > 
> > if ( opt_pv_oob_pat && (l1f & PAGE_CACHE_ATTRS) > _PAGE_WP )
> > {
> >     // #GP
> >     return -EINVAL;
> > }
> > 
> > fairly early on.
> > 
> > It occurs to me that this check is only applicable when we're using the
> > Xen ABI APT, and seeing as we've talked about possibly making patch 5 a
> > Kconfig option, that may want accounting here.  (Perhaps simply making
> > opt_pb_oob_pat be false in a !XEN_PAT build.)
> 
> It’s actually applicable even with other PATs.  While Marek and I were
> tracking down an Intel iGPU cache coherency problem, Marek used it to
> verify that PAT entries that we thought were not being used were in fact
> unused.  This allowed proving that the behavior of the GPU was impacted
> by changes to PAT entries the hardware should not even be looking at,
> and therefore that the hardware itself must be buggy.

In fact, I did that via WARN() on the Linux side, to _not_ have guest
killed in this case, to potentially collect more info. As said above,
with alternative PAT settings, the contract which entries are "valid"
isn't there anymore, so punishing guest for using them isn't
appropriate. It could still be useful feature for debugging Linux (and
it feels, we'll need this feature for some time...). So, with !XEN_PAT
it should be at least disabled by default, or maybe even hidden behind
CONFIG_DEBUG.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2023-01-06  2:30       ` Marek Marczykowski-Górecki
@ 2023-01-06  5:23         ` Demi Marie Obenour
  0 siblings, 0 replies; 14+ messages in thread
From: Demi Marie Obenour @ 2023-01-06  5:23 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, xen-devel, Jan Beulich, Roger Pau Monne, Wei Liu,
	Tim (Xen.org),
	George Dunlap

[-- Attachment #1: Type: text/plain, Size: 6791 bytes --]

On Fri, Jan 06, 2023 at 03:30:01AM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 05, 2023 at 09:00:03PM -0500, Demi Marie Obenour wrote:
> > On Thu, Jan 05, 2023 at 08:28:26PM +0000, Andrew Cooper wrote:
> > > On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
> > > > diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
> > > > index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
> > > > --- a/docs/misc/xen-command-line.pandoc
> > > > +++ b/docs/misc/xen-command-line.pandoc
> > > > @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
> > > >  ### idle_latency_factor (x86)
> > > >  > `= <integer>`
> > > >  
> > > > +### invalid-cacheability (x86)
> > > > +> `= allow | deny | trap`
> > > > +
> > > > +> Default: `deny` in release builds, otherwise `trap`
> > > > +
> > > > +Specify what happens when a PV guest tries to use one of the reserved entries in
> > > > +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
> > > > +the attempt, and `trap` causes a general protection fault to be raised.
> > > > +Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
> > > > +will change if new memory types are added, so guests must not rely on it.
> > > > +
> > > 
> > > This wants to be scoped under "pv", and not a top-level option.  Current
> > > parsing is at the top of xen/arch/x86/pv/domain.c
> > > 
> > > How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean ?
> > 
> > Works for me, though I will use ‘invalid’ instead of ‘oob’ as valid PAT
> > entries might not be contiguous.
> 
> If we're talking about alternative PAT settings, I'm not sure if they
> can be called "invalid". With the default Xen's choice of PAT, top two
> entries are documented as reserved (in xen.h) and only that makes them
> forbidden to use. But with alternative settings, it's only behavior of
> Linux parsing that prefers using lower options. When breaking contract
> set in xen.h, "reserved" entries stop being reserved too.

That makes sense.

> So, _if_ that option would be applicable alternative PAT choice, it's
> only useful for debugging Linux specifically (assuming Linux won't
> change its approach to choosing entries - which I think it's allowed to
> do).

Point taken.

> > > There really is no need to distinguish between deny and trap.  IMO,
> > > injecting #GP unilaterally is fine (to a guest expecting the hypercall
> > > to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
> > > useful to a human trying to debug issues here), but I could be talked
> > > into putting that behind a CONFIG_DEBUG if other feel strongly.
> > 
> > Marek, thoughts?
> 
> With Xen's default PAT, #GP may be useful indeed, but it must come with
> a message why it was injected.

In xl dmesg?

> > > > @@ -1343,7 +1374,34 @@ static int promote_l1_table(struct page_info *page)
> > > >          }
> > > >          else
> > > >          {
> > > > -            switch ( ret = get_page_from_l1e(pl1e[i], d, d) )
> > > > +            l1_pgentry_t l1e = pl1e[i];
> > > > +
> > > > +            if ( invalid_cacheability != INVALID_CACHEABILITY_ALLOW )
> > > > +            {
> > > > +                switch ( l1e.l1 & PAGE_CACHE_ATTRS )
> > > > +                {
> > > > +                case _PAGE_WB:
> > > > +                case _PAGE_UC:
> > > > +                case _PAGE_UCM:
> > > > +                case _PAGE_WC:
> > > > +                case _PAGE_WT:
> > > > +                case _PAGE_WP:
> > > > +                    break;
> > > > +                default:
> > > > +                    /*
> > > > +                     * If we get here, a PV guest tried to use one of the
> > > > +                     * reserved values in Xen's PAT.  This indicates a bug
> > > > +                     * in the guest.  If requested by the user, inject #GP
> > > > +                     * to cause the guest to log a stack trace.
> > > > +                     */
> > > > +                    if ( invalid_cacheability == INVALID_CACHEABILITY_TRAP )
> > > > +                        pv_inject_hw_exception(TRAP_gp_fault, 0);
> > > > +                    ret = -EINVAL;
> > > > +                    goto fail;
> > > > +                }
> > > > +            }
> > > 
> > > This will catch cases where the guest writes a full pagetable, then
> > > promotes it, but won't catch cases where the guest modifies an
> > > individual entry with mmu_update/etc.
> > > 
> > > The logic needs to be in get_page_from_l1e() to get applied uniformly to
> > > all PTE modifications.
> > 
> > I will move it there, and also update Qubes OS’s patchset.
> > 
> > > Right now, the l1_disallow_mask() check near the start hides the "can
> > > you use a nonzero cacheattr" check.  If I ever get around to cleaning up
> > > my post-XSA-402 series, I have a load of modifications to this.
> > 
> > I came up with some major cleanups too.
> > 
> > > But this could be something like this:
> > > 
> > > if ( opt_pv_oob_pat && (l1f & PAGE_CACHE_ATTRS) > _PAGE_WP )
> > > {
> > >     // #GP
> > >     return -EINVAL;
> > > }
> > > 
> > > fairly early on.
> > > 
> > > It occurs to me that this check is only applicable when we're using the
> > > Xen ABI APT, and seeing as we've talked about possibly making patch 5 a
> > > Kconfig option, that may want accounting here.  (Perhaps simply making
> > > opt_pb_oob_pat be false in a !XEN_PAT build.)
> > 
> > It’s actually applicable even with other PATs.  While Marek and I were
> > tracking down an Intel iGPU cache coherency problem, Marek used it to
> > verify that PAT entries that we thought were not being used were in fact
> > unused.  This allowed proving that the behavior of the GPU was impacted
> > by changes to PAT entries the hardware should not even be looking at,
> > and therefore that the hardware itself must be buggy.
> 
> In fact, I did that via WARN() on the Linux side, to _not_ have guest
> killed in this case, to potentially collect more info.

Whoops!  I must have misunderstood what you meant by "trap".

> As said above,
> with alternative PAT settings, the contract which entries are "valid"
> isn't there anymore, so punishing guest for using them isn't
> appropriate. It could still be useful feature for debugging Linux (and
> it feels, we'll need this feature for some time...). So, with !XEN_PAT
> it should be at least disabled by default, or maybe even hidden behind
> CONFIG_DEBUG.

Okay, makes sense.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default
  2023-01-05 20:28   ` Andrew Cooper
  2023-01-06  2:00     ` Demi Marie Obenour
@ 2023-01-06  7:11     ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2023-01-06  7:11 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Marek Marczykowski-Górecki, Roger Pau Monne, Wei Liu,
	Tim (Xen.org),
	George Dunlap, Demi Marie Obenour, xen-devel

On 05.01.2023 21:28, Andrew Cooper wrote:
> On 22/12/2022 10:31 pm, Demi Marie Obenour wrote:
>> diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
>> index 424b12cfb27d6ade2ec63eacb8afe5df82465451..0230a7bc17cbd4362a42ea64cea695f31f5e0f86 100644
>> --- a/docs/misc/xen-command-line.pandoc
>> +++ b/docs/misc/xen-command-line.pandoc
>> @@ -1417,6 +1417,17 @@ detection of systems known to misbehave upon accesses to that port.
>>  ### idle_latency_factor (x86)
>>  > `= <integer>`
>>  
>> +### invalid-cacheability (x86)
>> +> `= allow | deny | trap`
>> +
>> +> Default: `deny` in release builds, otherwise `trap`
>> +
>> +Specify what happens when a PV guest tries to use one of the reserved entries in
>> +the PAT.  `deny` causes the attempt to be rejected with -EINVAL, `allow` allows
>> +the attempt, and `trap` causes a general protection fault to be raised.
>> +Currently, the reserved entries are marked as uncacheable in Xen's PAT, but this
>> +will change if new memory types are added, so guests must not rely on it.
>> +
> 
> This wants to be scoped under "pv", and not a top-level option.  Current
> parsing is at the top of xen/arch/x86/pv/domain.c
> 
> How about `pv={no-}oob-pat`, and parse it into the opt_pv_oob_pat boolean ?
> 
> There really is no need to distinguish between deny and trap.  IMO,
> injecting #GP unilaterally is fine (to a guest expecting the hypercall
> to succeed, -EINVAL vs #GP makes no difference, but #GP is far more
> useful to a human trying to debug issues here), but I could be talked
> into putting that behind a CONFIG_DEBUG if other feel strongly.

What is or is not useful to guests is guesswork. They might be "handling"
failure with BUG(), in which case they'd still see a backtrace. So to me,
as previously said, injecting #GP in the case here can only reasonably be
a debugging aid (and hence be dependent upon DEBUG).

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-01-06  7:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-22 22:31 [PATCH v6 0/5] Make PAT handling less brittle Demi Marie Obenour
2022-12-22 22:31 ` [PATCH v6 1/5] x86/mm: Avoid hard-coding PAT in get_page_from_l1e() Demi Marie Obenour
2022-12-22 22:31 ` [PATCH v6 2/5] x86: Remove MEMORY_NUM_TYPES and NO_HARDCODE_MEM_TYPE Demi Marie Obenour
2022-12-22 22:31 ` [PATCH v6 3/5] x86/mm: make code robust to future PAT changes Demi Marie Obenour
2023-01-05 14:01   ` Jan Beulich
2023-01-05 19:58     ` Andrew Cooper
2022-12-22 22:31 ` [PATCH v6 4/5] x86/mm: Reject invalid cacheability in PV guests by default Demi Marie Obenour
2023-01-05 14:30   ` Jan Beulich
2023-01-05 20:28   ` Andrew Cooper
2023-01-06  2:00     ` Demi Marie Obenour
2023-01-06  2:30       ` Marek Marczykowski-Górecki
2023-01-06  5:23         ` Demi Marie Obenour
2023-01-06  7:11     ` Jan Beulich
2022-12-22 22:31 ` [PATCH v6 5/5] x86: Use Linux's PAT Demi Marie Obenour

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.