xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] x86/pv: Alternative XSA-286 fix
@ 2020-10-27 14:10 Andrew Cooper
  2020-10-27 14:10 ` [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI Andrew Cooper
  2020-10-27 14:10 ` [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes Andrew Cooper
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Cooper @ 2020-10-27 14:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

Alternative XSA-286 fix, with far less of a performance hit.

v3:
 * Split into two patches, to try and make the TLB flushing in do_mmu_update()
   a tractable problem to solve.

Andrew Cooper (2):
  x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI
  x86/pv: Flush TLB in response to paging structure changes

 xen/arch/x86/mm.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 10 deletions(-)

-- 
2.11.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI
  2020-10-27 14:10 [PATCH v3 0/2] x86/pv: Alternative XSA-286 fix Andrew Cooper
@ 2020-10-27 14:10 ` Andrew Cooper
  2020-10-27 14:37   ` Jan Beulich
  2020-10-27 14:10 ` [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes Andrew Cooper
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2020-10-27 14:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

c/s 9d1d31ad9498 "x86: slightly reduce Meltdown band-aid overhead" removed the
use of Global TLB flushes on the Xen entry path, but added a FLUSH_TLB_GLOBAL
to the L4 path in do_mmu_update().

However, this was unnecessary.

The L4 resync will pick up any new mappings created by the L4 change.  Any
changes to existing mappings are the guests responsibility to flush, and if
one is needed, an MMUEXT_OP hypercall will follow.

This is (not really) XSA-286 (but necessary to simplify the logic).

Fixes: 9d1d31ad9498 ("x86: slightly reduce Meltdown band-aid overhead")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v3:
 * New
---
 xen/arch/x86/mm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b2f35b3e7d..38168189aa 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4188,7 +4188,7 @@ long do_mmu_update(
 
         cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
         if ( !cpumask_empty(mask) )
-            flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
+            flush_mask(mask, FLUSH_ROOT_PGTBL);
     }
 
     perfc_add(num_page_updates, i);
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes
  2020-10-27 14:10 [PATCH v3 0/2] x86/pv: Alternative XSA-286 fix Andrew Cooper
  2020-10-27 14:10 ` [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI Andrew Cooper
@ 2020-10-27 14:10 ` Andrew Cooper
  2020-10-27 14:46   ` Jan Beulich
  1 sibling, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2020-10-27 14:10 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné, Wei Liu

With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
is safe from Xen's point of view (as the update only affects guest mappings),
and the guest is required to flush (if necessary) after making updates.

However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
writeable pagetables, etc.) is an implementation detail outside of the
API/ABI.

Changes in the paging structure require invalidations in the linear pagetable
range for subsequent accesses into the linear pagetables to access non-stale
mappings.  Xen must provide suitable flushing to prevent intermixed guest
actions from accidentally accessing/modifying the wrong pagetable.

For all L2 and higher modifications, flush the TLB.  PV guests cannot create
L2 or higher entries with the Global bit set, so no mappings established in
the linear range can be global.  (This could in principle be an order 39 flush
starting at LINEAR_PT_VIRT_START, but no such mechanism exists in practice.)

Express the necessary flushes as a set of booleans which accumulate across the
operation.  Comment the flushing logic extensively.

This is XSA-286

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

v3:
 * Rework from scratch vs v2.
---
 xen/arch/x86/mm.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 38168189aa..a3704ef648 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3891,7 +3891,8 @@ long do_mmu_update(
     struct vcpu *curr = current, *v = curr;
     struct domain *d = v->domain, *pt_owner = d, *pg_owner;
     mfn_t map_mfn = INVALID_MFN, mfn;
-    bool sync_guest = false;
+    bool flush_linear_pt = false, flush_root_pt_local = false,
+        flush_root_pt_others = false;
     uint32_t xsm_needed = 0;
     uint32_t xsm_checked = 0;
     int rc = put_old_guest_table(curr);
@@ -4041,6 +4042,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l2_entry(va, l2e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_linear_pt = true;
                     break;
 
                 case PGT_l3_page_table:
@@ -4048,6 +4051,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l3_entry(va, l3e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_linear_pt = true;
                     break;
 
                 case PGT_l4_page_table:
@@ -4055,6 +4060,8 @@ long do_mmu_update(
                         break;
                     rc = mod_l4_entry(va, l4e_from_intpte(req.val), mfn,
                                       cmd == MMU_PT_UPDATE_PRESERVE_AD, v);
+                    if ( !rc )
+                        flush_linear_pt = true;
                     if ( !rc && pt_owner->arch.pv.xpti )
                     {
                         bool local_in_use = false;
@@ -4063,7 +4070,7 @@ long do_mmu_update(
                                     mfn) )
                         {
                             local_in_use = true;
-                            get_cpu_info()->root_pgt_changed = true;
+                            flush_root_pt_local = true;
                         }
 
                         /*
@@ -4075,7 +4082,7 @@ long do_mmu_update(
                              (1 + !!(page->u.inuse.type_info & PGT_pinned) +
                               mfn_eq(pagetable_get_mfn(curr->arch.guest_table_user),
                                      mfn) + local_in_use) )
-                            sync_guest = true;
+                            flush_root_pt_others = true;
                     }
                     break;
 
@@ -4177,19 +4184,61 @@ long do_mmu_update(
     if ( va )
         unmap_domain_page(va);
 
-    if ( sync_guest )
+    /*
+     * Perform required TLB maintenance.
+     *
+     * This logic currently depend on flush_linear_pt being a superset of the
+     * flush_root_pt_* conditions.
+     *
+     * pt_owner may not be current->domain.  This may occur during
+     * construction of 32bit PV guests, or debugging of PV guests.  The
+     * behaviour cannot be correct with domain unpaused.  We therefore expect
+     * pt_owner->dirty_cpumask to be empty, but it is a waste of effort to
+     * explicitly check for an exclude this corner case.
+     *
+     * flush_linear_pt requires a FLUSH_TLB to all dirty CPUs.  The flush must
+     * be performed now to maintain correct behaviour across a multicall.
+     * i.e. we cannot relax FLUSH_TLB to FLUSH_ROOT_PGTBL, given that the
+     * former is a side effect of the latter, because the resync (which is in
+     * the return-to-guest path) happens too late.
+     *
+     * flush_root_pt_* requires FLUSH_ROOT_PGTBL on either the local CPU
+     * (implies pt_owner == current->domain and current->processor set in
+     * pt_owner->dirty_cpumask), and/or all *other* dirty CPUs as there are
+     * references we can't account for locally.
+     */
+    if ( flush_linear_pt /* || flush_root_pt_local || flush_root_pt_others */ )
     {
+        unsigned int cpu = smp_processor_id();
+        cpumask_t *mask = pt_owner->dirty_cpumask;
+
         /*
-         * Force other vCPU-s of the affected guest to pick up L4 entry
-         * changes (if any).
+         * Always handle local flushing separately (if applicable), to
+         * separate the flush invocations appropriately for scope of the two
+         * flush_root_pt_* variables.
          */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
+        if ( likely(cpumask_test_cpu(cpu, mask)) )
+        {
+            mask = per_cpu(scratch_cpumask, cpu);
 
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
+            cpumask_copy(mask, pt_owner->dirty_cpumask);
+            __cpumask_clear_cpu(cpu, mask);
+
+            flush_local(FLUSH_TLB |
+                        (flush_root_pt_local ? FLUSH_ROOT_PGTBL : 0));
+        }
+        else
+            /* Sanity check.  flush_root_pt_local implies local cpu is dirty. */
+            ASSERT(!flush_root_pt_local);
+
+        /* Flush the remote dirty CPUs.  Does not include the local CPU. */
         if ( !cpumask_empty(mask) )
-            flush_mask(mask, FLUSH_ROOT_PGTBL);
+            flush_mask(mask, FLUSH_TLB |
+                       (flush_root_pt_others ? FLUSH_ROOT_PGTBL : 0));
     }
+    else
+        /* Sanity check.  flush_root_pt_* implies flush_linear_pt. */
+        ASSERT(!flush_root_pt_local && !flush_root_pt_others);
 
     perfc_add(num_page_updates, i);
 
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI
  2020-10-27 14:10 ` [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI Andrew Cooper
@ 2020-10-27 14:37   ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2020-10-27 14:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.10.2020 15:10, Andrew Cooper wrote:
> c/s 9d1d31ad9498 "x86: slightly reduce Meltdown band-aid overhead" removed the
> use of Global TLB flushes on the Xen entry path, but added a FLUSH_TLB_GLOBAL
> to the L4 path in do_mmu_update().
> 
> However, this was unnecessary.
> 
> The L4 resync will pick up any new mappings created by the L4 change.  Any
> changes to existing mappings are the guests responsibility to flush, and if
> one is needed, an MMUEXT_OP hypercall will follow.
> 
> This is (not really) XSA-286 (but necessary to simplify the logic).
> 
> Fixes: 9d1d31ad9498 ("x86: slightly reduce Meltdown band-aid overhead")

While - with yet another round of consideration - I agree, I
would have thought justifying the correctness of the change
by d543fa409358 ("xen/x86: disable global pages for domains with
XPTI active") would have been easier. Perhaps at least worth
mentioning?

I agree that this reasoning is necessary to warrant the Fixes:
tag, and hence to indicate backporting of this change is going
to be fine all the way through to (the backports of) this
commit.

The primary complication with the justification you use is with
the commit you refer to saying "so that inside Xen we'll also
pick up such changes" in its description. I assume your reasoning
wrt this is that any hypercall started on another vCPU before
the mmu-update here completes can't rely on Xen observing the
new entries, which is what I believe I viewed differently at the
time. Whereas any one started after the sync/flush will obviously
observe the new entries (because of the very sync). My main
problem with this reasoning has been that it "feels" as if there
might be a way by which a guest could arrange a hypercall which
ought to observe the new entries but, because of the sync
happening only on the exit paths, doesn't. Quite likely such
"arranging", if possible at all, would need to be called "out of
spec".

> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes
  2020-10-27 14:10 ` [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes Andrew Cooper
@ 2020-10-27 14:46   ` Jan Beulich
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Beulich @ 2020-10-27 14:46 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Roger Pau Monné, Wei Liu, Xen-devel

On 27.10.2020 15:10, Andrew Cooper wrote:
> With MMU_UPDATE, a PV guest can make changes to higher level pagetables.  This
> is safe from Xen's point of view (as the update only affects guest mappings),
> and the guest is required to flush (if necessary) after making updates.
> 
> However, Xen's use of linear pagetables (UPDATE_VA_MAPPING, GNTTABOP_map,
> writeable pagetables, etc.) is an implementation detail outside of the
> API/ABI.
> 
> Changes in the paging structure require invalidations in the linear pagetable
> range for subsequent accesses into the linear pagetables to access non-stale
> mappings.  Xen must provide suitable flushing to prevent intermixed guest
> actions from accidentally accessing/modifying the wrong pagetable.
> 
> For all L2 and higher modifications, flush the TLB.  PV guests cannot create
> L2 or higher entries with the Global bit set, so no mappings established in
> the linear range can be global.

Perhaps "..., so no guest controlled mappings ..."?

> @@ -4177,19 +4184,61 @@ long do_mmu_update(
>      if ( va )
>          unmap_domain_page(va);
>  
> -    if ( sync_guest )
> +    /*
> +     * Perform required TLB maintenance.
> +     *
> +     * This logic currently depend on flush_linear_pt being a superset of the
> +     * flush_root_pt_* conditions.
> +     *
> +     * pt_owner may not be current->domain.  This may occur during
> +     * construction of 32bit PV guests, or debugging of PV guests.  The
> +     * behaviour cannot be correct with domain unpaused.  We therefore expect
> +     * pt_owner->dirty_cpumask to be empty, but it is a waste of effort to
> +     * explicitly check for an exclude this corner case.

Nit: s/ an / and /

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-10-27 14:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27 14:10 [PATCH v3 0/2] x86/pv: Alternative XSA-286 fix Andrew Cooper
2020-10-27 14:10 ` [PATCH v3 1/2] x86/pv: Drop FLUSH_TLB_GLOBAL in do_mmu_update() for XPTI Andrew Cooper
2020-10-27 14:37   ` Jan Beulich
2020-10-27 14:10 ` [PATCH v3 2/2] x86/pv: Flush TLB in response to paging structure changes Andrew Cooper
2020-10-27 14:46   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).