xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode
@ 2020-02-19 17:43 Roger Pau Monne
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Paul Durrant, Tim Deegan,
	Jan Beulich, Roger Pau Monne

Hello,

The following series aims to improve the TLB flush times when running
nested Xen, and it's specially beneficial when running in shim mode.

Only the HAP guest TLB flush is improved, the shadow paging TLB flush is
left as-is, and can be improved later if there's interest.

For a reference on the performance improvement see patch #7, as it's a
huge increase which can benefit other guests using assisted TLB flushes,
and also the ones using the viridian TLB flush assist (ie: Windows).

All patches have at least one Reviewed-by or Acked-by tag, the only code
change vs v4 is a fix to extract the order from the flags field in patch
#7.

Thanks, Roger.

Roger Pau Monne (7):
  x86/hvm: allow ASID flush when v != current
  x86/paging: add TLB flush hooks
  x86/hap: improve hypervisor assisted guest TLB flush
  x86/tlb: introduce a flush guests TLB flag
  x86/tlb: allow disabling the TLB clock
  xen/guest: prepare hypervisor ops to use alternative calls
  x86/tlb: use Xen L0 assisted TLB flush when available

 xen/arch/x86/flushtlb.c                | 24 ++++++---
 xen/arch/x86/guest/hyperv/hyperv.c     |  2 +-
 xen/arch/x86/guest/hypervisor.c        | 51 ++++++++++--------
 xen/arch/x86/guest/xen/xen.c           |  8 ++-
 xen/arch/x86/hvm/asid.c                |  6 +--
 xen/arch/x86/hvm/hvm.c                 | 51 ++----------------
 xen/arch/x86/mm/hap/hap.c              | 52 +++++++++++++++++++
 xen/arch/x86/mm/shadow/common.c        | 71 +++++++++++++++++++++++---
 xen/arch/x86/mm/shadow/hvm.c           |  2 +-
 xen/arch/x86/mm/shadow/multi.c         | 16 +++---
 xen/arch/x86/smp.c                     | 11 ++++
 xen/include/asm-x86/flushtlb.h         | 19 ++++++-
 xen/include/asm-x86/guest/hypervisor.h | 17 ++++++
 xen/include/asm-x86/hap.h              |  3 ++
 xen/include/asm-x86/shadow.h           | 12 +++++
 15 files changed, 246 insertions(+), 99 deletions(-)

-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 13:29   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks Roger Pau Monne
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Current implementation of hvm_asid_flush_vcpu is not safe to use
unless the target vCPU is either paused or the currently running one,
as it modifies the generation without any locking.

Fix this by using atomic operations when accessing the generation
field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
allows to safely flush the current ASID generation. Note that for the
flush to take effect if the vCPU is currently running a vmexit is
required.

Note the same could be achieved by introducing an extra field to
hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
fields to 0, so there's no need to delay this to the vmentry ASID
helper.

This is not a bugfix as no callers that would violate the assumptions
listed in the first paragraph have been found, but a preparatory
change in order to allow remote flushing of HVM vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/asid.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/asid.c b/xen/arch/x86/hvm/asid.c
index 8e00a28443..63ce462d56 100644
--- a/xen/arch/x86/hvm/asid.c
+++ b/xen/arch/x86/hvm/asid.c
@@ -83,7 +83,7 @@ void hvm_asid_init(int nasids)
 
 void hvm_asid_flush_vcpu_asid(struct hvm_vcpu_asid *asid)
 {
-    asid->generation = 0;
+    write_atomic(&asid->generation, 0);
 }
 
 void hvm_asid_flush_vcpu(struct vcpu *v)
@@ -121,7 +121,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)
         goto disabled;
 
     /* Test if VCPU has valid ASID. */
-    if ( asid->generation == data->core_asid_generation )
+    if ( read_atomic(&asid->generation) == data->core_asid_generation )
         return 0;
 
     /* If there are no free ASIDs, need to go to a new generation */
@@ -135,7 +135,7 @@ bool_t hvm_asid_handle_vmenter(struct hvm_vcpu_asid *asid)
 
     /* Now guaranteed to be a free ASID. */
     asid->asid = data->next_asid++;
-    asid->generation = data->core_asid_generation;
+    write_atomic(&asid->generation, data->core_asid_generation);
 
     /*
      * When we assign ASID 1, flush all TLB entries as we are starting a new
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 14:50   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich,
	Roger Pau Monne

Add shadow and hap implementation specific helpers to perform guest
TLB flushes. Note that the code for both is exactly the same at the
moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
further patches that will add implementation specific optimizations to
them.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Tim Deegan <tim@xen.org>
---
Changes since v3:
 - Fix stray newline removal.
 - Fix return of shadow_flush_tlb dummy function.
---
 xen/arch/x86/hvm/hvm.c          | 51 ++----------------------------
 xen/arch/x86/mm/hap/hap.c       | 54 ++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/shadow/common.c | 55 +++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hap.h       |  3 ++
 xen/include/asm-x86/shadow.h    | 12 +++++++
 5 files changed, 127 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 00a9e70b7c..4049f57232 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
 bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                         void *ctxt)
 {
-    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
-    cpumask_t *mask = &this_cpu(flush_cpumask);
-    struct domain *d = current->domain;
-    struct vcpu *v;
-
-    /* Avoid deadlock if more than one vcpu tries this at the same time. */
-    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
-        return false;
-
-    /* Pause all other vcpus. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_pause_nosync(v);
-
-    /* Now that all VCPUs are signalled to deschedule, we wait... */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            while ( !vcpu_runnable(v) && v->is_running )
-                cpu_relax();
-
-    /* All other vcpus are paused, safe to unlock now. */
-    spin_unlock(&d->hypercall_deadlock_mutex);
-
-    cpumask_clear(mask);
-
-    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
-    for_each_vcpu ( d, v )
-    {
-        unsigned int cpu;
-
-        if ( !flush_vcpu(ctxt, v) )
-            continue;
-
-        paging_update_cr3(v, false);
+    struct domain *currd = current->domain;
 
-        cpu = read_atomic(&v->dirty_cpu);
-        if ( is_vcpu_dirty_cpu(cpu) )
-            __cpumask_set_cpu(cpu, mask);
-    }
-
-    /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
-
-    /* Done. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_unpause(v);
-
-    return true;
+    return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
+                                      : hap_flush_tlb(flush_vcpu, ctxt);
 }
 
 static bool always_flush(void *ctxt, struct vcpu *v)
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 3d93f3451c..6894c1aa38 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -669,6 +669,60 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     hvm_update_guest_cr3(v, noflush);
 }
 
+bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                   void *ctxt)
+{
+    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
+    cpumask_t *mask = &this_cpu(flush_cpumask);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+
+    /* Avoid deadlock if more than one vcpu tries this at the same time. */
+    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+        return false;
+
+    /* Pause all other vcpus. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_pause_nosync(v);
+
+    /* Now that all VCPUs are signalled to deschedule, we wait... */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            while ( !vcpu_runnable(v) && v->is_running )
+                cpu_relax();
+
+    /* All other vcpus are paused, safe to unlock now. */
+    spin_unlock(&d->hypercall_deadlock_mutex);
+
+    cpumask_clear(mask);
+
+    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
+    for_each_vcpu ( d, v )
+    {
+        unsigned int cpu;
+
+        if ( !flush_vcpu(ctxt, v) )
+            continue;
+
+        paging_update_cr3(v, false);
+
+        cpu = read_atomic(&v->dirty_cpu);
+        if ( is_vcpu_dirty_cpu(cpu) )
+            __cpumask_set_cpu(cpu, mask);
+    }
+
+    /* Flush TLBs on all CPUs with dirty vcpu state. */
+    flush_tlb_mask(mask);
+
+    /* Done. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_unpause(v);
+
+    return true;
+}
+
 const struct paging_mode *
 hap_paging_get_mode(struct vcpu *v)
 {
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index cba3ab1eba..121ddf1255 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -3357,6 +3357,61 @@ out:
     return rc;
 }
 
+/* Fluhs TLB of selected vCPUs. */
+bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                      void *ctxt)
+{
+    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
+    cpumask_t *mask = &this_cpu(flush_cpumask);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+
+    /* Avoid deadlock if more than one vcpu tries this at the same time. */
+    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
+        return false;
+
+    /* Pause all other vcpus. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_pause_nosync(v);
+
+    /* Now that all VCPUs are signalled to deschedule, we wait... */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            while ( !vcpu_runnable(v) && v->is_running )
+                cpu_relax();
+
+    /* All other vcpus are paused, safe to unlock now. */
+    spin_unlock(&d->hypercall_deadlock_mutex);
+
+    cpumask_clear(mask);
+
+    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
+    for_each_vcpu ( d, v )
+    {
+        unsigned int cpu;
+
+        if ( !flush_vcpu(ctxt, v) )
+            continue;
+
+        paging_update_cr3(v, false);
+
+        cpu = read_atomic(&v->dirty_cpu);
+        if ( is_vcpu_dirty_cpu(cpu) )
+            __cpumask_set_cpu(cpu, mask);
+    }
+
+    /* Flush TLBs on all CPUs with dirty vcpu state. */
+    flush_tlb_mask(mask);
+
+    /* Done. */
+    for_each_vcpu ( d, v )
+        if ( v != current && flush_vcpu(ctxt, v) )
+            vcpu_unpause(v);
+
+    return true;
+}
+
 /**************************************************************************/
 /* Shadow-control XEN_DOMCTL dispatcher */
 
diff --git a/xen/include/asm-x86/hap.h b/xen/include/asm-x86/hap.h
index b94bfb4ed0..0c6aa26b9b 100644
--- a/xen/include/asm-x86/hap.h
+++ b/xen/include/asm-x86/hap.h
@@ -46,6 +46,9 @@ int   hap_track_dirty_vram(struct domain *d,
 extern const struct paging_mode *hap_paging_get_mode(struct vcpu *);
 int hap_set_allocation(struct domain *d, unsigned int pages, bool *preempted);
 
+bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                   void *ctxt);
+
 #endif /* XEN_HAP_H */
 
 /*
diff --git a/xen/include/asm-x86/shadow.h b/xen/include/asm-x86/shadow.h
index 907c71f497..cfd4650a16 100644
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -95,6 +95,10 @@ void shadow_blow_tables_per_domain(struct domain *d);
 int shadow_set_allocation(struct domain *d, unsigned int pages,
                           bool *preempted);
 
+/* Flush the TLB of the selected vCPUs. */
+bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
+                      void *ctxt);
+
 #else /* !CONFIG_SHADOW_PAGING */
 
 #define shadow_teardown(d, p) ASSERT(is_pv_domain(d))
@@ -106,6 +110,14 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
 #define shadow_set_allocation(d, pages, preempted) \
     ({ ASSERT_UNREACHABLE(); -EOPNOTSUPP; })
 
+static inline bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt,
+                                                       struct vcpu *v),
+                                    void *ctxt)
+{
+    ASSERT_UNREACHABLE();
+    return false;
+}
+
 static inline void sh_remove_shadows(struct domain *d, mfn_t gmfn,
                                      int fast, int all) {}
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 13:58   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel
  Cc: George Dunlap, Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

The current implementation of the hypervisor assisted flush for HAP is
extremely inefficient.

First of all there's no need to call paging_update_cr3, as the only
relevant part of that function when doing a flush is the ASID vCPU
flush, so just call that function directly.

Since hvm_asid_flush_vcpu is protected against concurrent callers by
using atomic operations there's no need anymore to pause the affected
vCPUs.

Finally the global TLB flush performed by flush_tlb_mask is also not
necessary, since we only want to flush the guest TLB state it's enough
to trigger a vmexit on the pCPUs currently holding any vCPU state, as
such vmexit will already perform an ASID/VPID update, and thus clear
the guest TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
Changes since v3:
 - s/do_flush/handle_flush/.
 - Add comment about handle_flush usage.
 - Fix VPID typo in comment.
---
 xen/arch/x86/mm/hap/hap.c | 52 +++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 6894c1aa38..dbb61bf9c6 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     hvm_update_guest_cr3(v, noflush);
 }
 
+/*
+ * NB: doesn't actually perform any flush, used just to clear the CPU from the
+ * mask and hence signal that the guest TLB flush has been done.
+ */
+static void handle_flush(void *data)
+{
+    cpumask_t *mask = data;
+    unsigned int cpu = smp_processor_id();
+
+    ASSERT(cpumask_test_cpu(cpu, mask));
+    cpumask_clear_cpu(cpu, mask);
+}
+
 bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
                    void *ctxt)
 {
     static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
     cpumask_t *mask = &this_cpu(flush_cpumask);
     struct domain *d = current->domain;
+    unsigned int this_cpu = smp_processor_id();
     struct vcpu *v;
 
-    /* Avoid deadlock if more than one vcpu tries this at the same time. */
-    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
-        return false;
-
-    /* Pause all other vcpus. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_pause_nosync(v);
-
-    /* Now that all VCPUs are signalled to deschedule, we wait... */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            while ( !vcpu_runnable(v) && v->is_running )
-                cpu_relax();
-
-    /* All other vcpus are paused, safe to unlock now. */
-    spin_unlock(&d->hypercall_deadlock_mutex);
-
     cpumask_clear(mask);
 
     /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
@@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
         if ( !flush_vcpu(ctxt, v) )
             continue;
 
-        paging_update_cr3(v, false);
+        hvm_asid_flush_vcpu(v);
 
         cpu = read_atomic(&v->dirty_cpu);
-        if ( is_vcpu_dirty_cpu(cpu) )
+        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
             __cpumask_set_cpu(cpu, mask);
     }
 
-    /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
-
-    /* Done. */
-    for_each_vcpu ( d, v )
-        if ( v != current && flush_vcpu(ctxt, v) )
-            vcpu_unpause(v);
+    /*
+     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
+     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
+     * not currently running will already be flushed when scheduled because of
+     * the ASID tickle done in the loop above.
+     */
+    on_selected_cpus(mask, handle_flush, mask, 0);
+    while ( !cpumask_empty(mask) )
+        cpu_relax();
 
     return true;
 }
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (2 preceding siblings ...)
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 16:14   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Wei Liu, George Dunlap, Andrew Cooper, Tim Deegan, Jan Beulich,
	Roger Pau Monne

Introduce a specific flag to request a HVM guest TLB flush, which is
an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.

This was previously unconditionally done in each pre_flush call, but
that's not required: HVM guests not using shadow don't require linear
TLB flushes as Xen doesn't modify the guest page tables in that case
(ie: when using HAP).

Modify all shadow code TLB flushes to also flush the guest TLB, in
order to keep the previous behavior. I haven't looked at each specific
shadow code TLB flush in order to figure out whether it actually
requires a guest TLB flush or not, so there might be room for
improvement in that regard.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/flushtlb.c         |  5 +++--
 xen/arch/x86/mm/shadow/common.c | 18 +++++++++---------
 xen/arch/x86/mm/shadow/hvm.c    |  2 +-
 xen/arch/x86/mm/shadow/multi.c  | 16 ++++++++--------
 xen/include/asm-x86/flushtlb.h  |  2 ++
 5 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 03f92c23dc..e7ccd4ec7b 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -59,8 +59,6 @@ static u32 pre_flush(void)
         raise_softirq(NEW_TLBFLUSH_CLOCK_PERIOD_SOFTIRQ);
 
  skip_clocktick:
-    hvm_flush_guest_tlbs();
-
     return t2;
 }
 
@@ -221,6 +219,9 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
             do_tlb_flush();
     }
 
+    if ( flags & FLUSH_GUESTS_TLB )
+        hvm_flush_guest_tlbs();
+
     if ( flags & FLUSH_CACHE )
     {
         const struct cpuinfo_x86 *c = &current_cpu_data;
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 121ddf1255..4847f24d3b 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -363,7 +363,7 @@ static int oos_remove_write_access(struct vcpu *v, mfn_t gmfn,
     }
 
     if ( ftlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     return 0;
 }
@@ -939,7 +939,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
                 /* See if that freed up enough space */
                 if ( d->arch.paging.shadow.free_pages >= pages )
                 {
-                    flush_tlb_mask(d->dirty_cpumask);
+                    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
                     return;
                 }
             }
@@ -993,7 +993,7 @@ static void shadow_blow_tables(struct domain *d)
                                pagetable_get_mfn(v->arch.shadow_table[i]), 0);
 
     /* Make sure everyone sees the unshadowings */
-    flush_tlb_mask(d->dirty_cpumask);
+    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 }
 
 void shadow_blow_tables_per_domain(struct domain *d)
@@ -1102,7 +1102,7 @@ mfn_t shadow_alloc(struct domain *d,
         if ( unlikely(!cpumask_empty(&mask)) )
         {
             perfc_incr(shadow_alloc_tlbflush);
-            flush_tlb_mask(&mask);
+            flush_mask(&mask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         }
         /* Now safe to clear the page for reuse */
         clear_domain_page(page_to_mfn(sp));
@@ -2290,7 +2290,7 @@ void sh_remove_shadows(struct domain *d, mfn_t gmfn, int fast, int all)
 
     /* Need to flush TLBs now, so that linear maps are safe next time we
      * take a fault. */
-    flush_tlb_mask(d->dirty_cpumask);
+    flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     paging_unlock(d);
 }
@@ -3005,7 +3005,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
         {
             sh_remove_all_shadows_and_parents(d, mfn);
             if ( sh_remove_all_mappings(d, mfn, _gfn(gfn)) )
-                flush_tlb_mask(d->dirty_cpumask);
+                flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         }
     }
 
@@ -3045,7 +3045,7 @@ static void sh_unshadow_for_p2m_change(struct domain *d, unsigned long gfn,
                 }
                 omfn = mfn_add(omfn, 1);
             }
-            flush_tlb_mask(&flushmask);
+            flush_mask(&flushmask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
             if ( npte )
                 unmap_domain_page(npte);
@@ -3332,7 +3332,7 @@ int shadow_track_dirty_vram(struct domain *d,
         }
     }
     if ( flush_tlb )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     goto out;
 
 out_sl1ma:
@@ -3402,7 +3402,7 @@ bool shadow_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
     }
 
     /* Flush TLBs on all CPUs with dirty vcpu state. */
-    flush_tlb_mask(mask);
+    flush_mask(mask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     /* Done. */
     for_each_vcpu ( d, v )
diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
index a219266fa2..64077d181b 100644
--- a/xen/arch/x86/mm/shadow/hvm.c
+++ b/xen/arch/x86/mm/shadow/hvm.c
@@ -590,7 +590,7 @@ static void validate_guest_pt_write(struct vcpu *v, mfn_t gmfn,
 
     if ( rc & SHADOW_SET_FLUSH )
         /* Need to flush TLBs to pick up shadow PT changes */
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     if ( rc & SHADOW_SET_ERROR )
     {
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 26798b317c..22aeb97b1e 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3066,7 +3066,7 @@ static int sh_page_fault(struct vcpu *v,
         perfc_incr(shadow_rm_write_flush_tlb);
         smp_wmb();
         atomic_inc(&d->arch.paging.shadow.gtable_dirty_version);
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
@@ -3575,7 +3575,7 @@ static bool sh_invlpg(struct vcpu *v, unsigned long linear)
     if ( mfn_to_page(sl1mfn)->u.sh.type
          == SH_type_fl1_shadow )
     {
-        flush_tlb_local();
+        flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB);
         return false;
     }
 
@@ -3810,7 +3810,7 @@ sh_update_linear_entries(struct vcpu *v)
          * table entry. But, without this change, it would fetch the wrong
          * value due to a stale TLB.
          */
-        flush_tlb_local();
+        flush_local(FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 }
 
@@ -4011,7 +4011,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
      * (old) shadow linear maps in the writeable mapping heuristics. */
 #if GUEST_PAGING_LEVELS == 2
     if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow);
 #elif GUEST_PAGING_LEVELS == 3
     /* PAE guests have four shadow_table entries, based on the
@@ -4035,7 +4035,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
             }
         }
         if ( flush )
-            flush_tlb_mask(d->dirty_cpumask);
+            flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
         /* Now install the new shadows. */
         for ( i = 0; i < 4; i++ )
         {
@@ -4056,7 +4056,7 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
     }
 #elif GUEST_PAGING_LEVELS == 4
     if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow);
     if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
     {
@@ -4502,7 +4502,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         }
     }
     if ( flush )
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
 
     /* Remember that we've seen the guest use this interface, so we
      * can rely on it using it in future, instead of guessing at
@@ -4539,7 +4539,7 @@ static void sh_pagetable_dying(paddr_t gpa)
         mfn_to_page(gmfn)->pagetable_dying = true;
         shadow_unhook_mappings(d, smfn, 1/* user pages only */);
         /* Now flush the TLB: we removed toplevel mappings. */
-        flush_tlb_mask(d->dirty_cpumask);
+        flush_mask(d->dirty_cpumask, FLUSH_TLB | FLUSH_GUESTS_TLB);
     }
 
     /* Remember that we've seen the guest use this interface, so we
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 2cfe4e6e97..07f9bc6103 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
 #define FLUSH_VCPU_STATE 0x1000
  /* Flush the per-cpu root page table */
 #define FLUSH_ROOT_PGTBL 0x2000
+ /* Flush all HVM guests linear TLB (using ASID/VPID) */
+#define FLUSH_GUESTS_TLB 0x4000
 
 /* Flush local TLBs/caches. */
 unsigned int flush_area_local(const void *va, unsigned int flags);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (3 preceding siblings ...)
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 16:25   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls Roger Pau Monne
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

The TLB clock is helpful when running Xen on bare metal because when
doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
last flush.

This is not the case however when Xen is running virtualized, and the
underlying hypervisor provides mechanism to assist in performing TLB
flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
order to perform a TLB flush without having to IPI each CPU. When
using such mechanisms it's no longer possible to keep a timestamp of
the flushes on each CPU, as they are performed by the underlying
hypervisor.

Offer a boolean in order to signal Xen that the timestamped TLB
shouldn't be used. This avoids keeping the timestamps of the flushes,
and also forces NEED_FLUSH to always return true.

No functional change intended, as this change doesn't introduce any
user that disables the timestamped TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
 xen/arch/x86/flushtlb.c        | 19 +++++++++++++------
 xen/include/asm-x86/flushtlb.h | 17 ++++++++++++++++-
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index e7ccd4ec7b..3649900793 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -32,6 +32,9 @@
 u32 tlbflush_clock = 1U;
 DEFINE_PER_CPU(u32, tlbflush_time);
 
+/* Signals whether the TLB flush clock is in use. */
+bool __read_mostly tlb_clk_enabled = true;
+
 /*
  * pre_flush(): Increment the virtual TLB-flush clock. Returns new clock value.
  * 
@@ -82,12 +85,13 @@ static void post_flush(u32 t)
 static void do_tlb_flush(void)
 {
     unsigned long flags, cr4;
-    u32 t;
+    u32 t = 0;
 
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
 
     if ( use_invpcid )
         invpcid_flush_all();
@@ -99,7 +103,8 @@ static void do_tlb_flush(void)
     else
         write_cr3(read_cr3());
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
@@ -107,7 +112,7 @@ static void do_tlb_flush(void)
 void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
 {
     unsigned long flags, old_cr4;
-    u32 t;
+    u32 t = 0;
 
     /* Throughout this function we make this assumption: */
     ASSERT(!(cr4 & X86_CR4_PCIDE) || !(cr4 & X86_CR4_PGE));
@@ -115,7 +120,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     /* This non-reentrant function is sometimes called in interrupt context. */
     local_irq_save(flags);
 
-    t = pre_flush();
+    if ( tlb_clk_enabled )
+        t = pre_flush();
 
     old_cr4 = read_cr4();
     ASSERT(!(old_cr4 & X86_CR4_PCIDE) || !(old_cr4 & X86_CR4_PGE));
@@ -167,7 +173,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4)
     if ( cr4 & X86_CR4_PCIDE )
         invpcid_flush_all_nonglobals();
 
-    post_flush(t);
+    if ( tlb_clk_enabled )
+        post_flush(t);
 
     local_irq_restore(flags);
 }
diff --git a/xen/include/asm-x86/flushtlb.h b/xen/include/asm-x86/flushtlb.h
index 07f9bc6103..9773014320 100644
--- a/xen/include/asm-x86/flushtlb.h
+++ b/xen/include/asm-x86/flushtlb.h
@@ -21,10 +21,21 @@ extern u32 tlbflush_clock;
 /* Time at which each CPU's TLB was last flushed. */
 DECLARE_PER_CPU(u32, tlbflush_time);
 
-#define tlbflush_current_time() tlbflush_clock
+/* TLB clock is in use. */
+extern bool tlb_clk_enabled;
+
+static inline uint32_t tlbflush_current_time(void)
+{
+    /* Returning 0 from tlbflush_current_time will always force a flush. */
+    return tlb_clk_enabled ? tlbflush_clock : 0;
+}
 
 static inline void page_set_tlbflush_timestamp(struct page_info *page)
 {
+    /* Avoid the write if the TLB clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     /*
      * Prevent storing a stale time stamp, which could happen if an update
      * to tlbflush_clock plus a subsequent flush IPI happen between the
@@ -67,6 +78,10 @@ static inline void tlbflush_filter(cpumask_t *mask, uint32_t page_timestamp)
 {
     unsigned int cpu;
 
+    /* Short-circuit: there's no need to iterate if the clock is disabled. */
+    if ( !tlb_clk_enabled )
+        return;
+
     for_each_cpu ( cpu, mask )
         if ( !NEED_FLUSH(per_cpu(tlbflush_time, cpu), page_timestamp) )
             __cpumask_clear_cpu(cpu, mask);
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (4 preceding siblings ...)
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 16:29   ` Jan Beulich
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Paul Durrant, Jan Beulich, Wei Liu, Roger Pau Monne

Adapt the hypervisor ops framework so it can be used with the
alternative calls framework. So far no hooks are modified to make use
of the alternatives patching, as they are not in any hot path.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/guest/hyperv/hyperv.c |  2 +-
 xen/arch/x86/guest/hypervisor.c    | 41 +++++++++++++++---------------
 xen/arch/x86/guest/xen/xen.c       |  2 +-
 3 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/xen/arch/x86/guest/hyperv/hyperv.c b/xen/arch/x86/guest/hyperv/hyperv.c
index fabc62b0d6..70f4cd5ae0 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
         panic("Unable to reserve Hyper-V hypercall range\n");
 }
 
-static const struct hypervisor_ops ops = {
+static const struct hypervisor_ops __initdata ops = {
     .name = "Hyper-V",
     .setup = setup,
     .ap_setup = ap_setup,
diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 5fd433c8d4..647cdb1367 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -24,52 +24,53 @@
 #include <asm/cache.h>
 #include <asm/guest.h>
 
-static const struct hypervisor_ops *__read_mostly ops;
+static struct hypervisor_ops __read_mostly ops;
 
 const char *__init hypervisor_probe(void)
 {
+    const struct hypervisor_ops *fns;
+
     if ( !cpu_has_hypervisor )
         return NULL;
 
-    ops = xg_probe();
-    if ( ops )
-        return ops->name;
+    fns = xg_probe();
+    if ( !fns )
+        /*
+         * Detection of Hyper-V must come after Xen to avoid false positive due
+         * to viridian support
+         */
+        fns = hyperv_probe();
 
-    /*
-     * Detection of Hyper-V must come after Xen to avoid false positive due
-     * to viridian support
-     */
-    ops = hyperv_probe();
-    if ( ops )
-        return ops->name;
+    if ( fns )
+        ops = *fns;
 
-    return NULL;
+    return ops.name;
 }
 
 void __init hypervisor_setup(void)
 {
-    if ( ops && ops->setup )
-        ops->setup();
+    if ( ops.setup )
+        ops.setup();
 }
 
 int hypervisor_ap_setup(void)
 {
-    if ( ops && ops->ap_setup )
-        return ops->ap_setup();
+    if ( ops.ap_setup )
+        return ops.ap_setup();
 
     return 0;
 }
 
 void hypervisor_resume(void)
 {
-    if ( ops && ops->resume )
-        ops->resume();
+    if ( ops.resume )
+        ops.resume();
 }
 
 void __init hypervisor_e820_fixup(struct e820map *e820)
 {
-    if ( ops && ops->e820_fixup )
-        ops->e820_fixup(e820);
+    if ( ops.e820_fixup )
+        ops.e820_fixup(e820);
 }
 
 /*
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index 3cf8f667a1..f151b07548 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,7 +324,7 @@ static void __init e820_fixup(struct e820map *e820)
         pv_shim_fixup_e820(e820);
 }
 
-static const struct hypervisor_ops ops = {
+static const struct hypervisor_ops __initdata ops = {
     .name = "Xen",
     .setup = setup,
     .ap_setup = ap_setup,
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
                   ` (5 preceding siblings ...)
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls Roger Pau Monne
@ 2020-02-19 17:43 ` Roger Pau Monne
  2020-02-28 17:00   ` Jan Beulich
  6 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monne @ 2020-02-19 17:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Jan Beulich, Roger Pau Monne

Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
This greatly increases the performance of TLB flushes when running
with a high amount of vCPUs as a Xen guest, and is specially important
when running in shim mode.

The following figures are from a PV guest running `make -j32 xen` in
shim mode with 32 vCPUs and HAP.

Using x2APIC and ALLBUT shorthand:
real	4m35.973s
user	4m35.110s
sys	36m24.117s

Using L0 assisted flush:
real    1m2.596s
user    4m34.818s
sys     5m16.374s

The implementation adds a new hook to hypervisor_ops so other
enlightenments can also implement such assisted flush just by filling
the hook. Note that the Xen implementation completely ignores the
dirty CPU mask and the linear address passed in, and always performs a
global TLB flush on all vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
---
Changes since v4:
 - Adjust order calculation.

Changes since v3:
 - Use an alternative call for the flush hook.

Changes since v1:
 - Add a L0 assisted hook to hypervisor ops.
---
 xen/arch/x86/guest/hypervisor.c        | 10 ++++++++++
 xen/arch/x86/guest/xen/xen.c           |  6 ++++++
 xen/arch/x86/smp.c                     | 11 +++++++++++
 xen/include/asm-x86/guest/hypervisor.h | 17 +++++++++++++++++
 4 files changed, 44 insertions(+)

diff --git a/xen/arch/x86/guest/hypervisor.c b/xen/arch/x86/guest/hypervisor.c
index 647cdb1367..47e938e287 100644
--- a/xen/arch/x86/guest/hypervisor.c
+++ b/xen/arch/x86/guest/hypervisor.c
@@ -18,6 +18,7 @@
  *
  * Copyright (c) 2019 Microsoft.
  */
+#include <xen/cpumask.h>
 #include <xen/init.h>
 #include <xen/types.h>
 
@@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
         ops.e820_fixup(e820);
 }
 
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order)
+{
+    if ( ops.flush_tlb )
+        return alternative_call(ops.flush_tlb, mask, va, order);
+
+    return -ENOSYS;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/guest/xen/xen.c b/xen/arch/x86/guest/xen/xen.c
index f151b07548..5d3427a713 100644
--- a/xen/arch/x86/guest/xen/xen.c
+++ b/xen/arch/x86/guest/xen/xen.c
@@ -324,12 +324,18 @@ static void __init e820_fixup(struct e820map *e820)
         pv_shim_fixup_e820(e820);
 }
 
+static int flush_tlb(const cpumask_t *mask, const void *va, unsigned int order)
+{
+    return xen_hypercall_hvm_op(HVMOP_flush_tlbs, NULL);
+}
+
 static const struct hypervisor_ops __initdata ops = {
     .name = "Xen",
     .setup = setup,
     .ap_setup = ap_setup,
     .resume = resume,
     .e820_fixup = e820_fixup,
+    .flush_tlb = flush_tlb,
 };
 
 const struct hypervisor_ops *__init xg_probe(void)
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index fac295fa6f..55d08c9d52 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -15,6 +15,7 @@
 #include <xen/perfc.h>
 #include <xen/spinlock.h>
 #include <asm/current.h>
+#include <asm/guest.h>
 #include <asm/smp.h>
 #include <asm/mc146818rtc.h>
 #include <asm/flushtlb.h>
@@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
     if ( (flags & ~FLUSH_ORDER_MASK) &&
          !cpumask_subset(mask, cpumask_of(cpu)) )
     {
+        if ( cpu_has_hypervisor &&
+             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
+                         FLUSH_ORDER_MASK)) &&
+             !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
+        {
+            if ( tlb_clk_enabled )
+                tlb_clk_enabled = false;
+            return;
+        }
+
         spin_lock(&flush_lock);
         cpumask_and(&flush_cpumask, mask, &cpu_online_map);
         cpumask_clear_cpu(cpu, &flush_cpumask);
diff --git a/xen/include/asm-x86/guest/hypervisor.h b/xen/include/asm-x86/guest/hypervisor.h
index ade10e74ea..432e57c2a0 100644
--- a/xen/include/asm-x86/guest/hypervisor.h
+++ b/xen/include/asm-x86/guest/hypervisor.h
@@ -19,6 +19,8 @@
 #ifndef __X86_HYPERVISOR_H__
 #define __X86_HYPERVISOR_H__
 
+#include <xen/cpumask.h>
+
 #include <asm/e820.h>
 
 struct hypervisor_ops {
@@ -32,6 +34,8 @@ struct hypervisor_ops {
     void (*resume)(void);
     /* Fix up e820 map */
     void (*e820_fixup)(struct e820map *e820);
+    /* L0 assisted TLB flush */
+    int (*flush_tlb)(const cpumask_t *mask, const void *va, unsigned int order);
 };
 
 #ifdef CONFIG_GUEST
@@ -41,6 +45,14 @@ void hypervisor_setup(void);
 int hypervisor_ap_setup(void);
 void hypervisor_resume(void);
 void hypervisor_e820_fixup(struct e820map *e820);
+/*
+ * L0 assisted TLB flush.
+ * mask: cpumask of the dirty vCPUs that should be flushed.
+ * va: linear address to flush, or NULL for global flushes.
+ * order: order of the linear address pointed by va.
+ */
+int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                         unsigned int order);
 
 #else
 
@@ -52,6 +64,11 @@ static inline void hypervisor_setup(void) { ASSERT_UNREACHABLE(); }
 static inline int hypervisor_ap_setup(void) { return 0; }
 static inline void hypervisor_resume(void) { ASSERT_UNREACHABLE(); }
 static inline void hypervisor_e820_fixup(struct e820map *e820) {}
+static inline int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
+                                       unsigned int order)
+{
+    return -ENOSYS;
+}
 
 #endif  /* CONFIG_GUEST */
 
-- 
2.25.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
@ 2020-02-28 13:29   ` Jan Beulich
  2020-02-28 15:27     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 13:29 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> Current implementation of hvm_asid_flush_vcpu is not safe to use
> unless the target vCPU is either paused or the currently running one,
> as it modifies the generation without any locking.

Indeed, but the issue you're taking care of is highly theoretical:
I don't think any sane compiler will split writes of the fields
to multiple insns. It would be nice if this was made clear here.

> Fix this by using atomic operations when accessing the generation
> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> allows to safely flush the current ASID generation. Note that for the
> flush to take effect if the vCPU is currently running a vmexit is
> required.
> 
> Note the same could be achieved by introducing an extra field to
> hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
> hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
> seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
> fields to 0, so there's no need to delay this to the vmentry ASID
> helper.
> 
> This is not a bugfix as no callers that would violate the assumptions
> listed in the first paragraph have been found, but a preparatory
> change in order to allow remote flushing of HVM vCPUs.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Wei Liu <wl@xen.org>

With a suitable clarification added
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
@ 2020-02-28 13:58   ` Jan Beulich
  2020-02-28 16:31     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 13:58 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> --- a/xen/arch/x86/mm/hap/hap.c
> +++ b/xen/arch/x86/mm/hap/hap.c
> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
>      hvm_update_guest_cr3(v, noflush);
>  }
>  
> +/*
> + * NB: doesn't actually perform any flush, used just to clear the CPU from the
> + * mask and hence signal that the guest TLB flush has been done.
> + */

"has been done" isn't correct, as the flush may happen only later
on (upon next VM entry). I think wording here needs to be as
precise as possible, however the comment may turn out unnecessary
altogether:

> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>          if ( !flush_vcpu(ctxt, v) )
>              continue;
>  
> -        paging_update_cr3(v, false);
> +        hvm_asid_flush_vcpu(v);
>  
>          cpu = read_atomic(&v->dirty_cpu);
> -        if ( is_vcpu_dirty_cpu(cpu) )
> +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>              __cpumask_set_cpu(cpu, mask);
>      }
>  
> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> -    flush_tlb_mask(mask);
> -
> -    /* Done. */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            vcpu_unpause(v);
> +    /*
> +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
> +     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
> +     * not currently running will already be flushed when scheduled because of
> +     * the ASID tickle done in the loop above.
> +     */
> +    on_selected_cpus(mask, handle_flush, mask, 0);
> +    while ( !cpumask_empty(mask) )
> +        cpu_relax();

Why do you pass 0 as last argument? If you passed 1, you wouldn't
need the while() here, handle_flush() could be empty, and you'd
save a perhaps large amount of CPUs to all try to modify two
cache lines (the one used by on_selected_cpus() itself and the
one here) instead of just one. Yes, latency from the last CPU
clearing its bit to you being able to exit from here may be a
little larger this way, but overall latency may shrink with the
cut by half amount of atomic ops issued to the bus.

(Forcing an empty function to be called may seem odd, but sending
 just some IPI [like the event check one] wouldn't be enough, as
 you want to be sure the other side has actually received the
 request.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks Roger Pau Monne
@ 2020-02-28 14:50   ` Jan Beulich
  2020-02-28 16:19     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 14:50 UTC (permalink / raw)
  To: Roger Pau Monne, Andrew Cooper
  Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu

On 19.02.2020 18:43, Roger Pau Monne wrote:
> Add shadow and hap implementation specific helpers to perform guest
> TLB flushes. Note that the code for both is exactly the same at the
> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> further patches that will add implementation specific optimizations to
> them.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Wei Liu <wl@xen.org>
> Acked-by: Tim Deegan <tim@xen.org>

This looks good in principle, with one possible anomaly:

> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>                          void *ctxt)
>  {
> -    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
> -    cpumask_t *mask = &this_cpu(flush_cpumask);
> -    struct domain *d = current->domain;
> -    struct vcpu *v;
> -
> -    /* Avoid deadlock if more than one vcpu tries this at the same time. */
> -    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> -        return false;
> -
> -    /* Pause all other vcpus. */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            vcpu_pause_nosync(v);
> -
> -    /* Now that all VCPUs are signalled to deschedule, we wait... */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            while ( !vcpu_runnable(v) && v->is_running )
> -                cpu_relax();
> -
> -    /* All other vcpus are paused, safe to unlock now. */
> -    spin_unlock(&d->hypercall_deadlock_mutex);
> -
> -    cpumask_clear(mask);
> -
> -    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
> -    for_each_vcpu ( d, v )
> -    {
> -        unsigned int cpu;
> -
> -        if ( !flush_vcpu(ctxt, v) )
> -            continue;
> -
> -        paging_update_cr3(v, false);
> +    struct domain *currd = current->domain;
>  
> -        cpu = read_atomic(&v->dirty_cpu);
> -        if ( is_vcpu_dirty_cpu(cpu) )
> -            __cpumask_set_cpu(cpu, mask);
> -    }
> -
> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> -    flush_tlb_mask(mask);
> -
> -    /* Done. */
> -    for_each_vcpu ( d, v )
> -        if ( v != current && flush_vcpu(ctxt, v) )
> -            vcpu_unpause(v);
> -
> -    return true;
> +    return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
> +                                      : hap_flush_tlb(flush_vcpu, ctxt);
>  }

Following our current model I think this should be a new pointer
in struct paging_mode (then truly fitting "hooks" in the title).
I can see the desire to avoid the indirect call though, but I
also think that if we were to go that route, we should settle on
switching around others as well which are paging mode dependent.
(FAOD this is nothing I ask you to do here.) Andrew, thoughts?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
  2020-02-28 13:29   ` Jan Beulich
@ 2020-02-28 15:27     ` Roger Pau Monné
  2020-02-28 15:47       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 15:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Current implementation of hvm_asid_flush_vcpu is not safe to use
> > unless the target vCPU is either paused or the currently running one,
> > as it modifies the generation without any locking.
> 
> Indeed, but the issue you're taking care of is highly theoretical:
> I don't think any sane compiler will split writes of the fields
> to multiple insns. It would be nice if this was made clear here.

What about adding:

> > Fix this by using atomic operations when accessing the generation
> > field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> > allows to safely flush the current ASID generation. Note that for the
> > flush to take effect if the vCPU is currently running a vmexit is
> > required.

"Most compilers will already do such writes and reads as a single
instruction, so the usage of atomic operations is mostly used as a
safety measure."

Here?

> > Note the same could be achieved by introducing an extra field to
> > hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
> > hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
> > seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
> > fields to 0, so there's no need to delay this to the vmentry ASID
> > helper.
> > 
> > This is not a bugfix as no callers that would violate the assumptions
> > listed in the first paragraph have been found, but a preparatory
> > change in order to allow remote flushing of HVM vCPUs.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Reviewed-by: Wei Liu <wl@xen.org>
> 
> With a suitable clarification added
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
  2020-02-28 15:27     ` Roger Pau Monné
@ 2020-02-28 15:47       ` Jan Beulich
  2020-02-28 16:20         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 15:47 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 28.02.2020 16:27, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> Current implementation of hvm_asid_flush_vcpu is not safe to use
>>> unless the target vCPU is either paused or the currently running one,
>>> as it modifies the generation without any locking.
>>
>> Indeed, but the issue you're taking care of is highly theoretical:
>> I don't think any sane compiler will split writes of the fields
>> to multiple insns. It would be nice if this was made clear here.
> 
> What about adding:
> 
>>> Fix this by using atomic operations when accessing the generation
>>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
>>> allows to safely flush the current ASID generation. Note that for the
>>> flush to take effect if the vCPU is currently running a vmexit is
>>> required.
> 
> "Most compilers will already do such writes and reads as a single
> instruction, so the usage of atomic operations is mostly used as a
> safety measure."
> 
> Here?

Could you perhaps start with "Compilers will normally ..." I'm fine
with the rest, it's just that "most compilers" still feels like
an understatement.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
@ 2020-02-28 16:14   ` Jan Beulich
  2020-02-28 16:50     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 16:14 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest TLB flush, which is
> an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.

Here and below, what do you mean by "linear"? I guess you mean
TLBs holding translations from guest linear to guest physical,
but I think this could do with then also saying so, even if it's
more words.

> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP).

This explains the correctness in one direction. What about the
removal of this from the switch_cr3_cr4() path? And what about
our safety assumptions from the ticking of tlbflush_clock,
where we then imply that pages e.g. about to be freed can't
have any translations left in any TLBs anymore?

> --- a/xen/include/asm-x86/flushtlb.h
> +++ b/xen/include/asm-x86/flushtlb.h
> @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
>  #define FLUSH_VCPU_STATE 0x1000
>   /* Flush the per-cpu root page table */
>  #define FLUSH_ROOT_PGTBL 0x2000
> + /* Flush all HVM guests linear TLB (using ASID/VPID) */
> +#define FLUSH_GUESTS_TLB 0x4000

For one, the "all" is pretty misleading. A single such request
doesn't do this for all vCPU-s of all HVM guests, does it? I'm
also struggling with the 'S' in "GUESTS" - why is it not just
"GUEST"? I admit the names of the involved functions
(hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
misleading, as they don't actually do any flushing, they merely
arrange for what is in the TLB to no longer be able to be used,
so giving this a suitable name is "historically" complicated.
What if we did away with the hvm_flush_guest_tlbs() wrapper,
naming the constant here then after hvm_asid_flush_core(), e.g.
FLUSH_ASID_CORE?

I also think this constant might better be zero when !HVM.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
  2020-02-28 14:50   ` Jan Beulich
@ 2020-02-28 16:19     ` Roger Pau Monné
  2020-02-28 16:40       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Add shadow and hap implementation specific helpers to perform guest
> > TLB flushes. Note that the code for both is exactly the same at the
> > moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> > further patches that will add implementation specific optimizations to
> > them.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Reviewed-by: Wei Liu <wl@xen.org>
> > Acked-by: Tim Deegan <tim@xen.org>
> 
> This looks good in principle, with one possible anomaly:
> 
> > --- a/xen/arch/x86/hvm/hvm.c
> > +++ b/xen/arch/x86/hvm/hvm.c
> > @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
> >  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >                          void *ctxt)
> >  {
> > -    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
> > -    cpumask_t *mask = &this_cpu(flush_cpumask);
> > -    struct domain *d = current->domain;
> > -    struct vcpu *v;
> > -
> > -    /* Avoid deadlock if more than one vcpu tries this at the same time. */
> > -    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> > -        return false;
> > -
> > -    /* Pause all other vcpus. */
> > -    for_each_vcpu ( d, v )
> > -        if ( v != current && flush_vcpu(ctxt, v) )
> > -            vcpu_pause_nosync(v);
> > -
> > -    /* Now that all VCPUs are signalled to deschedule, we wait... */
> > -    for_each_vcpu ( d, v )
> > -        if ( v != current && flush_vcpu(ctxt, v) )
> > -            while ( !vcpu_runnable(v) && v->is_running )
> > -                cpu_relax();
> > -
> > -    /* All other vcpus are paused, safe to unlock now. */
> > -    spin_unlock(&d->hypercall_deadlock_mutex);
> > -
> > -    cpumask_clear(mask);
> > -
> > -    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
> > -    for_each_vcpu ( d, v )
> > -    {
> > -        unsigned int cpu;
> > -
> > -        if ( !flush_vcpu(ctxt, v) )
> > -            continue;
> > -
> > -        paging_update_cr3(v, false);
> > +    struct domain *currd = current->domain;
> >  
> > -        cpu = read_atomic(&v->dirty_cpu);
> > -        if ( is_vcpu_dirty_cpu(cpu) )
> > -            __cpumask_set_cpu(cpu, mask);
> > -    }
> > -
> > -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> > -    flush_tlb_mask(mask);
> > -
> > -    /* Done. */
> > -    for_each_vcpu ( d, v )
> > -        if ( v != current && flush_vcpu(ctxt, v) )
> > -            vcpu_unpause(v);
> > -
> > -    return true;
> > +    return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
> > +                                      : hap_flush_tlb(flush_vcpu, ctxt);
> >  }
> 
> Following our current model I think this should be a new pointer
> in struct paging_mode (then truly fitting "hooks" in the title).

I tried doing it that way, but there was something weird about it, the
paging mode is per-vcpu, and hence I needed to do something like:

paging_get_hostmode(current)->flush(current->domain, ...)

I can try to move it to being a paging_mode hook if you prefer.

> I can see the desire to avoid the indirect call though, but I
> also think that if we were to go that route, we should settle on
> switching around others as well which are paging mode dependent.
> (FAOD this is nothing I ask you to do here.) Andrew, thoughts?

I think it's already quite of a mixed bag, see track_dirty_vram for
example which uses a similar model.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current
  2020-02-28 15:47       ` Jan Beulich
@ 2020-02-28 16:20         ` Roger Pau Monné
  0 siblings, 0 replies; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 04:47:57PM +0100, Jan Beulich wrote:
> On 28.02.2020 16:27, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 02:29:09PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> Current implementation of hvm_asid_flush_vcpu is not safe to use
> >>> unless the target vCPU is either paused or the currently running one,
> >>> as it modifies the generation without any locking.
> >>
> >> Indeed, but the issue you're taking care of is highly theoretical:
> >> I don't think any sane compiler will split writes of the fields
> >> to multiple insns. It would be nice if this was made clear here.
> > 
> > What about adding:
> > 
> >>> Fix this by using atomic operations when accessing the generation
> >>> field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
> >>> allows to safely flush the current ASID generation. Note that for the
> >>> flush to take effect if the vCPU is currently running a vmexit is
> >>> required.
> > 
> > "Most compilers will already do such writes and reads as a single
> > instruction, so the usage of atomic operations is mostly used as a
> > safety measure."
> > 
> > Here?
> 
> Could you perhaps start with "Compilers will normally ..." I'm fine
> with the rest, it's just that "most compilers" still feels like
> an understatement.

Sure, that's fine.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
@ 2020-02-28 16:25   ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 16:25 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> The TLB clock is helpful when running Xen on bare metal because when
> doing a TLB flush each CPU is IPI'ed and can keep a timestamp of the
> last flush.
> 
> This is not the case however when Xen is running virtualized, and the
> underlying hypervisor provides mechanism to assist in performing TLB
> flushes: Xen itself for example offers a HVMOP_flush_tlbs hypercall in
> order to perform a TLB flush without having to IPI each CPU. When
> using such mechanisms it's no longer possible to keep a timestamp of
> the flushes on each CPU, as they are performed by the underlying
> hypervisor.
> 
> Offer a boolean in order to signal Xen that the timestamped TLB
> shouldn't be used. This avoids keeping the timestamps of the flushes,
> and also forces NEED_FLUSH to always return true.
> 
> No functional change intended, as this change doesn't introduce any
> user that disables the timestamped TLB.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Wei Liu <wl@xen.org>

Acked-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls Roger Pau Monne
@ 2020-02-28 16:29   ` Jan Beulich
  2020-02-28 16:52     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 16:29 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Paul Durrant, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
>          panic("Unable to reserve Hyper-V hypercall range\n");
>  }
>  
> -static const struct hypervisor_ops ops = {
> +static const struct hypervisor_ops __initdata ops = {

This needs to be __initconstrel in order to avoid triggering
(possibly only in the future) section mismatch warnings with
at least some gcc versions. With this and the other instance
adjusted
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-28 13:58   ` Jan Beulich
@ 2020-02-28 16:31     ` Roger Pau Monné
  2020-02-28 16:44       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/mm/hap/hap.c
> > +++ b/xen/arch/x86/mm/hap/hap.c
> > @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
> >      hvm_update_guest_cr3(v, noflush);
> >  }
> >  
> > +/*
> > + * NB: doesn't actually perform any flush, used just to clear the CPU from the
> > + * mask and hence signal that the guest TLB flush has been done.
> > + */
> 
> "has been done" isn't correct, as the flush may happen only later
> on (upon next VM entry). I think wording here needs to be as
> precise as possible, however the comment may turn out unnecessary
> altogether:

What about:

/*
 * NB: Dummy function exclusively used as a way to trigger a vmexit,
 * and thus force an ASID/VPID update on vmentry (that will flush the
 * cache).
 */

> > @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >          if ( !flush_vcpu(ctxt, v) )
> >              continue;
> >  
> > -        paging_update_cr3(v, false);
> > +        hvm_asid_flush_vcpu(v);
> >  
> >          cpu = read_atomic(&v->dirty_cpu);
> > -        if ( is_vcpu_dirty_cpu(cpu) )
> > +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
> >              __cpumask_set_cpu(cpu, mask);
> >      }
> >  
> > -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> > -    flush_tlb_mask(mask);
> > -
> > -    /* Done. */
> > -    for_each_vcpu ( d, v )
> > -        if ( v != current && flush_vcpu(ctxt, v) )
> > -            vcpu_unpause(v);
> > +    /*
> > +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
> > +     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
> > +     * not currently running will already be flushed when scheduled because of
> > +     * the ASID tickle done in the loop above.
> > +     */
> > +    on_selected_cpus(mask, handle_flush, mask, 0);
> > +    while ( !cpumask_empty(mask) )
> > +        cpu_relax();
> 
> Why do you pass 0 as last argument? If you passed 1, you wouldn't
> need the while() here, handle_flush() could be empty, and you'd
> save a perhaps large amount of CPUs to all try to modify two
> cache lines (the one used by on_selected_cpus() itself and the
> one here) instead of just one. Yes, latency from the last CPU
> clearing its bit to you being able to exit from here may be a
> little larger this way, but overall latency may shrink with the
> cut by half amount of atomic ops issued to the bus.

In fact I think passing 0 as the last argument is fine, and
handle_flush can be empty in that case anyway. We don't really care
whether on_selected_cpus returns before all CPUs have executed the
dummy function, as long as all of them have taken a vmexit. Using 0
already guarantees that AFAICT.

> (Forcing an empty function to be called may seem odd, but sending
>  just some IPI [like the event check one] wouldn't be enough, as
>  you want to be sure the other side has actually received the
>  request.)

Exactly, that's the reason for the empty function.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks
  2020-02-28 16:19     ` Roger Pau Monné
@ 2020-02-28 16:40       ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 16:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, Andrew Cooper, Tim Deegan, Wei Liu, xen-devel

On 28.02.2020 17:19, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 03:50:31PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> Add shadow and hap implementation specific helpers to perform guest
>>> TLB flushes. Note that the code for both is exactly the same at the
>>> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
>>> further patches that will add implementation specific optimizations to
>>> them.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Reviewed-by: Wei Liu <wl@xen.org>
>>> Acked-by: Tim Deegan <tim@xen.org>
>>
>> This looks good in principle, with one possible anomaly:
>>
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -3990,55 +3990,10 @@ static void hvm_s3_resume(struct domain *d)
>>>  bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>                          void *ctxt)
>>>  {
>>> -    static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
>>> -    cpumask_t *mask = &this_cpu(flush_cpumask);
>>> -    struct domain *d = current->domain;
>>> -    struct vcpu *v;
>>> -
>>> -    /* Avoid deadlock if more than one vcpu tries this at the same time. */
>>> -    if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
>>> -        return false;
>>> -
>>> -    /* Pause all other vcpus. */
>>> -    for_each_vcpu ( d, v )
>>> -        if ( v != current && flush_vcpu(ctxt, v) )
>>> -            vcpu_pause_nosync(v);
>>> -
>>> -    /* Now that all VCPUs are signalled to deschedule, we wait... */
>>> -    for_each_vcpu ( d, v )
>>> -        if ( v != current && flush_vcpu(ctxt, v) )
>>> -            while ( !vcpu_runnable(v) && v->is_running )
>>> -                cpu_relax();
>>> -
>>> -    /* All other vcpus are paused, safe to unlock now. */
>>> -    spin_unlock(&d->hypercall_deadlock_mutex);
>>> -
>>> -    cpumask_clear(mask);
>>> -
>>> -    /* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
>>> -    for_each_vcpu ( d, v )
>>> -    {
>>> -        unsigned int cpu;
>>> -
>>> -        if ( !flush_vcpu(ctxt, v) )
>>> -            continue;
>>> -
>>> -        paging_update_cr3(v, false);
>>> +    struct domain *currd = current->domain;
>>>  
>>> -        cpu = read_atomic(&v->dirty_cpu);
>>> -        if ( is_vcpu_dirty_cpu(cpu) )
>>> -            __cpumask_set_cpu(cpu, mask);
>>> -    }
>>> -
>>> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
>>> -    flush_tlb_mask(mask);
>>> -
>>> -    /* Done. */
>>> -    for_each_vcpu ( d, v )
>>> -        if ( v != current && flush_vcpu(ctxt, v) )
>>> -            vcpu_unpause(v);
>>> -
>>> -    return true;
>>> +    return shadow_mode_enabled(currd) ? shadow_flush_tlb(flush_vcpu, ctxt)
>>> +                                      : hap_flush_tlb(flush_vcpu, ctxt);
>>>  }
>>
>> Following our current model I think this should be a new pointer
>> in struct paging_mode (then truly fitting "hooks" in the title).
> 
> I tried doing it that way, but there was something weird about it, the
> paging mode is per-vcpu, and hence I needed to do something like:
> 
> paging_get_hostmode(current)->flush(current->domain, ...)

I don't see anything wrong with the left side of the -> (it
parallels what is needed for the write_p2m_entry() hook). For
the right I can't see why you'd want to have current->domain
there when both functions want flush_vcpu and ctxt. Ultimately
we probably want per-vCPU and per-domain hooks in separate
structures (the former for hooks where the current paging mode
matters, the latter for those where it doesn't matter), but of
course that's nothing I'm meaning to ask you to do.

> I can try to move it to being a paging_mode hook if you prefer.

It would seem cleaner to me, but ...

>> I can see the desire to avoid the indirect call though, but I
>> also think that if we were to go that route, we should settle on
>> switching around others as well which are paging mode dependent.
>> (FAOD this is nothing I ask you to do here.) Andrew, thoughts?

... as said I'd prefer to also know Andrew's opinion, in
particular to settle where we would want this to move in the
mid to long term. Whereas ...

> I think it's already quite of a mixed bag, see track_dirty_vram for
> example which uses a similar model.

... you probably know my typical response to something like
this: Bad examples aren't to be taken as excuse to introduce
further inconsistencies.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-28 16:31     ` Roger Pau Monné
@ 2020-02-28 16:44       ` Jan Beulich
  2020-02-28 16:57         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 16:44 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On 28.02.2020 17:31, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/mm/hap/hap.c
>>> +++ b/xen/arch/x86/mm/hap/hap.c
>>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
>>>      hvm_update_guest_cr3(v, noflush);
>>>  }
>>>  
>>> +/*
>>> + * NB: doesn't actually perform any flush, used just to clear the CPU from the
>>> + * mask and hence signal that the guest TLB flush has been done.
>>> + */
>>
>> "has been done" isn't correct, as the flush may happen only later
>> on (upon next VM entry). I think wording here needs to be as
>> precise as possible, however the comment may turn out unnecessary
>> altogether:
> 
> What about:
> 
> /*
>  * NB: Dummy function exclusively used as a way to trigger a vmexit,
>  * and thus force an ASID/VPID update on vmentry (that will flush the
>  * cache).
>  */

Once it's empty - yes, looks okay (with s/cache/TLB/).

>>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>          if ( !flush_vcpu(ctxt, v) )
>>>              continue;
>>>  
>>> -        paging_update_cr3(v, false);
>>> +        hvm_asid_flush_vcpu(v);
>>>  
>>>          cpu = read_atomic(&v->dirty_cpu);
>>> -        if ( is_vcpu_dirty_cpu(cpu) )
>>> +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>>>              __cpumask_set_cpu(cpu, mask);
>>>      }
>>>  
>>> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
>>> -    flush_tlb_mask(mask);
>>> -
>>> -    /* Done. */
>>> -    for_each_vcpu ( d, v )
>>> -        if ( v != current && flush_vcpu(ctxt, v) )
>>> -            vcpu_unpause(v);
>>> +    /*
>>> +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
>>> +     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
>>> +     * not currently running will already be flushed when scheduled because of
>>> +     * the ASID tickle done in the loop above.
>>> +     */
>>> +    on_selected_cpus(mask, handle_flush, mask, 0);
>>> +    while ( !cpumask_empty(mask) )
>>> +        cpu_relax();
>>
>> Why do you pass 0 as last argument? If you passed 1, you wouldn't
>> need the while() here, handle_flush() could be empty, and you'd
>> save a perhaps large amount of CPUs to all try to modify two
>> cache lines (the one used by on_selected_cpus() itself and the
>> one here) instead of just one. Yes, latency from the last CPU
>> clearing its bit to you being able to exit from here may be a
>> little larger this way, but overall latency may shrink with the
>> cut by half amount of atomic ops issued to the bus.
> 
> In fact I think passing 0 as the last argument is fine, and
> handle_flush can be empty in that case anyway. We don't really care
> whether on_selected_cpus returns before all CPUs have executed the
> dummy function, as long as all of them have taken a vmexit. Using 0
> already guarantees that AFAICT.

Isn't it that the caller ants to be guaranteed that the flush
has actually occurred (or at least that no further insns can
be executed prior to the flush happening, i.e. at least the VM
exit having occurred)?

>> (Forcing an empty function to be called may seem odd, but sending
>>  just some IPI [like the event check one] wouldn't be enough, as
>>  you want to be sure the other side has actually received the
>>  request.)
> 
> Exactly, that's the reason for the empty function.

But the function isn't empty.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-02-28 16:14   ` Jan Beulich
@ 2020-02-28 16:50     ` Roger Pau Monné
  2020-03-02 10:31       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Introduce a specific flag to request a HVM guest TLB flush, which is
> > an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.
> 
> Here and below, what do you mean by "linear"? I guess you mean
> TLBs holding translations from guest linear to guest physical,
> but I think this could do with then also saying so, even if it's
> more words.

Yes, will change.

> > This was previously unconditionally done in each pre_flush call, but
> > that's not required: HVM guests not using shadow don't require linear
> > TLB flushes as Xen doesn't modify the guest page tables in that case
> > (ie: when using HAP).
> 
> This explains the correctness in one direction. What about the
> removal of this from the switch_cr3_cr4() path?

AFAICT that's never used by shadow code to modify cr3 or cr4, and
hence doesn't require a guest linear TLB flush.

> And what about
> our safety assumptions from the ticking of tlbflush_clock,
> where we then imply that pages e.g. about to be freed can't
> have any translations left in any TLBs anymore?

I'm slightly confused. That flush only affects HVM guests linear TLB,
and hence is not under Xen control unless shadow mode is used. Pages
to be freed in the HAP case need to be flushed from the EPT/NPT, but
whether there are references left in the guest TLB to point to that
gfn really doesn't matter to Xen at all, since the guest is in full
control of it's MMU and TLB in that case.

For shadow any change in the guest page-tables should already be
accompanied by a guest TLB flush, or else the guest could access no
longer present entries, regardless of whether the affected pages are
freed or not.

> > --- a/xen/include/asm-x86/flushtlb.h
> > +++ b/xen/include/asm-x86/flushtlb.h
> > @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
> >  #define FLUSH_VCPU_STATE 0x1000
> >   /* Flush the per-cpu root page table */
> >  #define FLUSH_ROOT_PGTBL 0x2000
> > + /* Flush all HVM guests linear TLB (using ASID/VPID) */
> > +#define FLUSH_GUESTS_TLB 0x4000
> 
> For one, the "all" is pretty misleading. A single such request
> doesn't do this for all vCPU-s of all HVM guests, does it?

It kind of does because it tickles the pCPU ASID/VPID generation ID,
so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID
allocated and thus a clean TLB.

> I'm
> also struggling with the 'S' in "GUESTS" - why is it not just
> "GUEST"? 

Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID
ID and thus a clean TLB.

> I admit the names of the involved functions
> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
> misleading, as they don't actually do any flushing, they merely
> arrange for what is in the TLB to no longer be able to be used,
> so giving this a suitable name is "historically" complicated.
> What if we did away with the hvm_flush_guest_tlbs() wrapper,
> naming the constant here then after hvm_asid_flush_core(), e.g.
> FLUSH_ASID_CORE?

I'm not opposed to renaming. The comment before the definition was
also meant to clarify it's usage, and hence the explicit mention of
ASID/VPID.

> I also think this constant might better be zero when !HVM.

Yes, that's a good idea.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
  2020-02-28 16:29   ` Jan Beulich
@ 2020-02-28 16:52     ` Roger Pau Monné
  2020-03-02 10:43       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 05:29:32PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/guest/hyperv/hyperv.c
> > +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> > @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
> >          panic("Unable to reserve Hyper-V hypercall range\n");
> >  }
> >  
> > -static const struct hypervisor_ops ops = {
> > +static const struct hypervisor_ops __initdata ops = {
> 
> This needs to be __initconstrel in order to avoid triggering
> (possibly only in the future) section mismatch warnings with
> at least some gcc versions. With this and the other instance
> adjusted

I can do that when posting a new version, unless you want to pick this
earlier and adjust on commit.

> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-28 16:44       ` Jan Beulich
@ 2020-02-28 16:57         ` Roger Pau Monné
  2020-02-28 17:02           ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 16:57 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote:
> On 28.02.2020 17:31, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/mm/hap/hap.c
> >>> +++ b/xen/arch/x86/mm/hap/hap.c
> >>> @@ -669,32 +669,28 @@ static void hap_update_cr3(struct vcpu *v, int do_locking, bool noflush)
> >>>      hvm_update_guest_cr3(v, noflush);
> >>>  }
> >>>  
> >>> +/*
> >>> + * NB: doesn't actually perform any flush, used just to clear the CPU from the
> >>> + * mask and hence signal that the guest TLB flush has been done.
> >>> + */
> >>
> >> "has been done" isn't correct, as the flush may happen only later
> >> on (upon next VM entry). I think wording here needs to be as
> >> precise as possible, however the comment may turn out unnecessary
> >> altogether:
> > 
> > What about:
> > 
> > /*
> >  * NB: Dummy function exclusively used as a way to trigger a vmexit,
> >  * and thus force an ASID/VPID update on vmentry (that will flush the
> >  * cache).
> >  */
> 
> Once it's empty - yes, looks okay (with s/cache/TLB/).

Ack.

> >>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> >>>          if ( !flush_vcpu(ctxt, v) )
> >>>              continue;
> >>>  
> >>> -        paging_update_cr3(v, false);
> >>> +        hvm_asid_flush_vcpu(v);
> >>>  
> >>>          cpu = read_atomic(&v->dirty_cpu);
> >>> -        if ( is_vcpu_dirty_cpu(cpu) )
> >>> +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
> >>>              __cpumask_set_cpu(cpu, mask);
> >>>      }
> >>>  
> >>> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
> >>> -    flush_tlb_mask(mask);
> >>> -
> >>> -    /* Done. */
> >>> -    for_each_vcpu ( d, v )
> >>> -        if ( v != current && flush_vcpu(ctxt, v) )
> >>> -            vcpu_unpause(v);
> >>> +    /*
> >>> +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
> >>> +     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
> >>> +     * not currently running will already be flushed when scheduled because of
> >>> +     * the ASID tickle done in the loop above.
> >>> +     */
> >>> +    on_selected_cpus(mask, handle_flush, mask, 0);
> >>> +    while ( !cpumask_empty(mask) )
> >>> +        cpu_relax();
> >>
> >> Why do you pass 0 as last argument? If you passed 1, you wouldn't
> >> need the while() here, handle_flush() could be empty, and you'd
> >> save a perhaps large amount of CPUs to all try to modify two
> >> cache lines (the one used by on_selected_cpus() itself and the
> >> one here) instead of just one. Yes, latency from the last CPU
> >> clearing its bit to you being able to exit from here may be a
> >> little larger this way, but overall latency may shrink with the
> >> cut by half amount of atomic ops issued to the bus.
> > 
> > In fact I think passing 0 as the last argument is fine, and
> > handle_flush can be empty in that case anyway. We don't really care
> > whether on_selected_cpus returns before all CPUs have executed the
> > dummy function, as long as all of them have taken a vmexit. Using 0
> > already guarantees that AFAICT.
> 
> Isn't it that the caller ants to be guaranteed that the flush
> has actually occurred (or at least that no further insns can
> be executed prior to the flush happening, i.e. at least the VM
> exit having occurred)?

Yes, but that would happen with 0 as the last argument already, see
the code in smp_call_function_interrupt:

    if ( call_data.wait )
    {
        (*func)(info);
        smp_mb();
        cpumask_clear_cpu(cpu, &call_data.selected);
    }
    else
    {
        smp_mb();
        cpumask_clear_cpu(cpu, &call_data.selected);
        (*func)(info);
    }

The only difference is whether on_selected_cpus can return before all
the functions have been executed on all CPUs, or whether all CPUs have
taken a vmexit. The later is fine for our use-case.

> >> (Forcing an empty function to be called may seem odd, but sending
> >>  just some IPI [like the event check one] wouldn't be enough, as
> >>  you want to be sure the other side has actually received the
> >>  request.)
> > 
> > Exactly, that's the reason for the empty function.
> 
> But the function isn't empty.

It will be now, since on_selected_cpus already does the needed
synchronization as you pointed out.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-19 17:43 ` [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
@ 2020-02-28 17:00   ` Jan Beulich
  2020-02-28 17:23     ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 17:00 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 19.02.2020 18:43, Roger Pau Monne wrote:
> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> This greatly increases the performance of TLB flushes when running
> with a high amount of vCPUs as a Xen guest, and is specially important
> when running in shim mode.
> 
> The following figures are from a PV guest running `make -j32 xen` in
> shim mode with 32 vCPUs and HAP.
> 
> Using x2APIC and ALLBUT shorthand:
> real	4m35.973s
> user	4m35.110s
> sys	36m24.117s
> 
> Using L0 assisted flush:
> real    1m2.596s
> user    4m34.818s
> sys     5m16.374s
> 
> The implementation adds a new hook to hypervisor_ops so other
> enlightenments can also implement such assisted flush just by filling
> the hook. Note that the Xen implementation completely ignores the
> dirty CPU mask and the linear address passed in, and always performs a
> global TLB flush on all vCPUs.

This isn't because of an implementation choice of yours, but because
of how HVMOP_flush_tlbs works. I think the statement should somehow
express this. I also think it wants clarifying that using the
hypercall is indeed faster even in the case of single-page, single-
CPU flush (which I suspect may not be the case especially as vCPU
count grows). The stats above prove a positive overall effect, but
they don't say whether the effect could be even bigger by being at
least a little selective.

> @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
>          ops.e820_fixup(e820);
>  }
>  
> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> +                         unsigned int order)
> +{
> +    if ( ops.flush_tlb )
> +        return alternative_call(ops.flush_tlb, mask, va, order);
> +
> +    return -ENOSYS;
> +}

Please no new -ENOSYS anywhere (except in new ports' top level
hypercall handlers).

> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>           !cpumask_subset(mask, cpumask_of(cpu)) )
>      {
> +        if ( cpu_has_hypervisor &&
> +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> +                         FLUSH_ORDER_MASK)) &&
> +             !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
> +        {
> +            if ( tlb_clk_enabled )
> +                tlb_clk_enabled = false;

Why does this need doing here? Couldn't Xen guest setup code
clear the flag?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush
  2020-02-28 16:57         ` Roger Pau Monné
@ 2020-02-28 17:02           ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-02-28 17:02 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: George Dunlap, xen-devel, Wei Liu, Andrew Cooper

On 28.02.2020 17:57, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 05:44:57PM +0100, Jan Beulich wrote:
>> On 28.02.2020 17:31, Roger Pau Monné wrote:
>>> On Fri, Feb 28, 2020 at 02:58:42PM +0100, Jan Beulich wrote:
>>>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>>>> @@ -705,20 +701,22 @@ bool hap_flush_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
>>>>>          if ( !flush_vcpu(ctxt, v) )
>>>>>              continue;
>>>>>  
>>>>> -        paging_update_cr3(v, false);
>>>>> +        hvm_asid_flush_vcpu(v);
>>>>>  
>>>>>          cpu = read_atomic(&v->dirty_cpu);
>>>>> -        if ( is_vcpu_dirty_cpu(cpu) )
>>>>> +        if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
>>>>>              __cpumask_set_cpu(cpu, mask);
>>>>>      }
>>>>>  
>>>>> -    /* Flush TLBs on all CPUs with dirty vcpu state. */
>>>>> -    flush_tlb_mask(mask);
>>>>> -
>>>>> -    /* Done. */
>>>>> -    for_each_vcpu ( d, v )
>>>>> -        if ( v != current && flush_vcpu(ctxt, v) )
>>>>> -            vcpu_unpause(v);
>>>>> +    /*
>>>>> +     * Trigger a vmexit on all pCPUs with dirty vCPU state in order to force an
>>>>> +     * ASID/VPID change and hence accomplish a guest TLB flush. Note that vCPUs
>>>>> +     * not currently running will already be flushed when scheduled because of
>>>>> +     * the ASID tickle done in the loop above.
>>>>> +     */
>>>>> +    on_selected_cpus(mask, handle_flush, mask, 0);
>>>>> +    while ( !cpumask_empty(mask) )
>>>>> +        cpu_relax();
>>>>
>>>> Why do you pass 0 as last argument? If you passed 1, you wouldn't
>>>> need the while() here, handle_flush() could be empty, and you'd
>>>> save a perhaps large amount of CPUs to all try to modify two
>>>> cache lines (the one used by on_selected_cpus() itself and the
>>>> one here) instead of just one. Yes, latency from the last CPU
>>>> clearing its bit to you being able to exit from here may be a
>>>> little larger this way, but overall latency may shrink with the
>>>> cut by half amount of atomic ops issued to the bus.
>>>
>>> In fact I think passing 0 as the last argument is fine, and
>>> handle_flush can be empty in that case anyway. We don't really care
>>> whether on_selected_cpus returns before all CPUs have executed the
>>> dummy function, as long as all of them have taken a vmexit. Using 0
>>> already guarantees that AFAICT.
>>
>> Isn't it that the caller ants to be guaranteed that the flush
>> has actually occurred (or at least that no further insns can
>> be executed prior to the flush happening, i.e. at least the VM
>> exit having occurred)?
> 
> Yes, but that would happen with 0 as the last argument already, see
> the code in smp_call_function_interrupt:
> 
>     if ( call_data.wait )
>     {
>         (*func)(info);
>         smp_mb();
>         cpumask_clear_cpu(cpu, &call_data.selected);
>     }
>     else
>     {
>         smp_mb();
>         cpumask_clear_cpu(cpu, &call_data.selected);
>         (*func)(info);
>     }
> 
> The only difference is whether on_selected_cpus can return before all
> the functions have been executed on all CPUs, or whether all CPUs have
> taken a vmexit. The later is fine for our use-case.

Oh, yes - right you are.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-28 17:00   ` Jan Beulich
@ 2020-02-28 17:23     ` Roger Pau Monné
  2020-03-02 10:52       ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-02-28 17:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote:
> On 19.02.2020 18:43, Roger Pau Monne wrote:
> > Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
> > This greatly increases the performance of TLB flushes when running
> > with a high amount of vCPUs as a Xen guest, and is specially important
> > when running in shim mode.
> > 
> > The following figures are from a PV guest running `make -j32 xen` in
> > shim mode with 32 vCPUs and HAP.
> > 
> > Using x2APIC and ALLBUT shorthand:
> > real	4m35.973s
> > user	4m35.110s
> > sys	36m24.117s
> > 
> > Using L0 assisted flush:
> > real    1m2.596s
> > user    4m34.818s
> > sys     5m16.374s
> > 
> > The implementation adds a new hook to hypervisor_ops so other
> > enlightenments can also implement such assisted flush just by filling
> > the hook. Note that the Xen implementation completely ignores the
> > dirty CPU mask and the linear address passed in, and always performs a
> > global TLB flush on all vCPUs.
> 
> This isn't because of an implementation choice of yours, but because
> of how HVMOP_flush_tlbs works. I think the statement should somehow
> express this. I also think it wants clarifying that using the
> hypercall is indeed faster even in the case of single-page, single-
> CPU flush

Are you sure about this? I think taking a vmexit is going to be more
costly than executing a local invlpg?

> (which I suspect may not be the case especially as vCPU
> count grows). The stats above prove a positive overall effect, but
> they don't say whether the effect could be even bigger by being at
> least a little selective.

I assume that being able to provide a bitmap with the vCPUs on whether
the TLB flush should be performed would give us some more performance,
but I haven't looked into it yet.

> > @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
> >          ops.e820_fixup(e820);
> >  }
> >  
> > +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
> > +                         unsigned int order)
> > +{
> > +    if ( ops.flush_tlb )
> > +        return alternative_call(ops.flush_tlb, mask, va, order);
> > +
> > +    return -ENOSYS;
> > +}
> 
> Please no new -ENOSYS anywhere (except in new ports' top level
> hypercall handlers).

Ack. Is EOPNOTSUPP OK?

> > @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
> >      if ( (flags & ~FLUSH_ORDER_MASK) &&
> >           !cpumask_subset(mask, cpumask_of(cpu)) )
> >      {
> > +        if ( cpu_has_hypervisor &&
> > +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
> > +                         FLUSH_ORDER_MASK)) &&
> > +             !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
> > +        {
> > +            if ( tlb_clk_enabled )
> > +                tlb_clk_enabled = false;
> 
> Why does this need doing here? Couldn't Xen guest setup code
> clear the flag?

Because it's possible that the hypercalls fails, and hence the tlb
clock should be kept enabled. There's no reason to disable it until
Xen knows the assisted flush is indeed available.

I don't mind moving it to Xen guest setup code, but I'm not sure I see
why it would be any better than doing it here. The only reason I guess
is to avoid checking tlb_clk_enabled on every successful assisted
flush?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-02-28 16:50     ` Roger Pau Monné
@ 2020-03-02 10:31       ` Jan Beulich
  2020-03-02 16:50         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-03-02 10:31 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On 28.02.2020 17:50, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> This was previously unconditionally done in each pre_flush call, but
>>> that's not required: HVM guests not using shadow don't require linear
>>> TLB flushes as Xen doesn't modify the guest page tables in that case
>>> (ie: when using HAP).
>>
>> This explains the correctness in one direction. What about the
>> removal of this from the switch_cr3_cr4() path?
> 
> AFAICT that's never used by shadow code to modify cr3 or cr4, and
> hence doesn't require a guest linear TLB flush.

XSA-294 tells me to be very conservative here. It is not necessarily
the direct use by shadow code that matters; toggle_guest_*() isn't
used directly by it, either.

>> And what about
>> our safety assumptions from the ticking of tlbflush_clock,
>> where we then imply that pages e.g. about to be freed can't
>> have any translations left in any TLBs anymore?
> 
> I'm slightly confused. That flush only affects HVM guests linear TLB,
> and hence is not under Xen control unless shadow mode is used. Pages
> to be freed in the HAP case need to be flushed from the EPT/NPT, but
> whether there are references left in the guest TLB to point to that
> gfn really doesn't matter to Xen at all, since the guest is in full
> control of it's MMU and TLB in that case.

Ah yes, sorry, I probably didn't get my thinking right around
combined mappings and when they get invalidated.

>>> --- a/xen/include/asm-x86/flushtlb.h
>>> +++ b/xen/include/asm-x86/flushtlb.h
>>> @@ -105,6 +105,8 @@ void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
>>>  #define FLUSH_VCPU_STATE 0x1000
>>>   /* Flush the per-cpu root page table */
>>>  #define FLUSH_ROOT_PGTBL 0x2000
>>> + /* Flush all HVM guests linear TLB (using ASID/VPID) */
>>> +#define FLUSH_GUESTS_TLB 0x4000
>>
>> For one, the "all" is pretty misleading. A single such request
>> doesn't do this for all vCPU-s of all HVM guests, does it?
> 
> It kind of does because it tickles the pCPU ASID/VPID generation ID,
> so any vCPU scheduled on the selected pCPUs will get a new ASID/VPID
> allocated and thus a clean TLB.
> 
>> I'm
>> also struggling with the 'S' in "GUESTS" - why is it not just
>> "GUEST"? 
> 
> Any guest vCPUs running on the selected pCPUs will get a new ASID/VPID
> ID and thus a clean TLB.

Right, I see. Yet ...

>> I admit the names of the involved functions
>> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
>> misleading, as they don't actually do any flushing, they merely
>> arrange for what is in the TLB to no longer be able to be used,
>> so giving this a suitable name is "historically" complicated.
>> What if we did away with the hvm_flush_guest_tlbs() wrapper,
>> naming the constant here then after hvm_asid_flush_core(), e.g.
>> FLUSH_ASID_CORE?
> 
> I'm not opposed to renaming. The comment before the definition was
> also meant to clarify it's usage, and hence the explicit mention of
> ASID/VPID.

... there's also one more argument for renaming: The present
name doesn't convey at all that this operation is HVM-only
(i.e. PV guests wouldn't have their TLBs [as far as one can
call them "their"] flushed).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
  2020-02-28 16:52     ` Roger Pau Monné
@ 2020-03-02 10:43       ` Jan Beulich
  2020-03-02 11:48         ` Roger Pau Monné
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2020-03-02 10:43 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Paul Durrant, Wei Liu, Andrew Cooper

On 28.02.2020 17:52, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 05:29:32PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/guest/hyperv/hyperv.c
>>> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
>>> @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
>>>          panic("Unable to reserve Hyper-V hypercall range\n");
>>>  }
>>>  
>>> -static const struct hypervisor_ops ops = {
>>> +static const struct hypervisor_ops __initdata ops = {
>>
>> This needs to be __initconstrel in order to avoid triggering
>> (possibly only in the future) section mismatch warnings with
>> at least some gcc versions. With this and the other instance
>> adjusted
> 
> I can do that when posting a new version, unless you want to pick this
> earlier and adjust on commit.

Is this to mean that this 2nd to last patch in the series is
fully independent of the earlier five (also contextually)?
Then of course I'd be fine to make the adjustments and commit.
Please confirm if so.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available
  2020-02-28 17:23     ` Roger Pau Monné
@ 2020-03-02 10:52       ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-03-02 10:52 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 28.02.2020 18:23, Roger Pau Monné wrote:
> On Fri, Feb 28, 2020 at 06:00:44PM +0100, Jan Beulich wrote:
>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>> Use Xen's L0 HVMOP_flush_tlbs hypercall in order to perform flushes.
>>> This greatly increases the performance of TLB flushes when running
>>> with a high amount of vCPUs as a Xen guest, and is specially important
>>> when running in shim mode.
>>>
>>> The following figures are from a PV guest running `make -j32 xen` in
>>> shim mode with 32 vCPUs and HAP.
>>>
>>> Using x2APIC and ALLBUT shorthand:
>>> real	4m35.973s
>>> user	4m35.110s
>>> sys	36m24.117s
>>>
>>> Using L0 assisted flush:
>>> real    1m2.596s
>>> user    4m34.818s
>>> sys     5m16.374s
>>>
>>> The implementation adds a new hook to hypervisor_ops so other
>>> enlightenments can also implement such assisted flush just by filling
>>> the hook. Note that the Xen implementation completely ignores the
>>> dirty CPU mask and the linear address passed in, and always performs a
>>> global TLB flush on all vCPUs.
>>
>> This isn't because of an implementation choice of yours, but because
>> of how HVMOP_flush_tlbs works. I think the statement should somehow
>> express this. I also think it wants clarifying that using the
>> hypercall is indeed faster even in the case of single-page, single-
>> CPU flush
> 
> Are you sure about this? I think taking a vmexit is going to be more
> costly than executing a local invlpg?

The answer to this was already ...

>> (which I suspect may not be the case especially as vCPU
>> count grows).

... here (or at least it was meant to address questions back
like this one).

>> The stats above prove a positive overall effect, but
>> they don't say whether the effect could be even bigger by being at
>> least a little selective.
> 
> I assume that being able to provide a bitmap with the vCPUs on whether
> the TLB flush should be performed would give us some more performance,
> but I haven't looked into it yet.
> 
>>> @@ -73,6 +74,15 @@ void __init hypervisor_e820_fixup(struct e820map *e820)
>>>          ops.e820_fixup(e820);
>>>  }
>>>  
>>> +int hypervisor_flush_tlb(const cpumask_t *mask, const void *va,
>>> +                         unsigned int order)
>>> +{
>>> +    if ( ops.flush_tlb )
>>> +        return alternative_call(ops.flush_tlb, mask, va, order);
>>> +
>>> +    return -ENOSYS;
>>> +}
>>
>> Please no new -ENOSYS anywhere (except in new ports' top level
>> hypercall handlers).
> 
> Ack. Is EOPNOTSUPP OK?

Sure.

>>> @@ -256,6 +257,16 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
>>>      if ( (flags & ~FLUSH_ORDER_MASK) &&
>>>           !cpumask_subset(mask, cpumask_of(cpu)) )
>>>      {
>>> +        if ( cpu_has_hypervisor &&
>>> +             !(flags & ~(FLUSH_TLB | FLUSH_TLB_GLOBAL | FLUSH_VA_VALID |
>>> +                         FLUSH_ORDER_MASK)) &&
>>> +             !hypervisor_flush_tlb(mask, va, (flags - 1) & FLUSH_ORDER_MASK) )
>>> +        {
>>> +            if ( tlb_clk_enabled )
>>> +                tlb_clk_enabled = false;
>>
>> Why does this need doing here? Couldn't Xen guest setup code
>> clear the flag?
> 
> Because it's possible that the hypercalls fails, and hence the tlb
> clock should be kept enabled. There's no reason to disable it until
> Xen knows the assisted flush is indeed available.
> 
> I don't mind moving it to Xen guest setup code, but I'm not sure I see
> why it would be any better than doing it here. The only reason I guess
> is to avoid checking tlb_clk_enabled on every successful assisted
> flush?

Well, iirc there had already been a question on why the if() here
is needed. I second the reason, but the whole construct looks
misplaced, i.e. is prone to cause further questions down the road.
I think if it was to stay here, a comment would be needed to
address any such possible questions. But I still think it should
live in the init code. This isn't a feature that simply lacks a
CPUID bit (or alike) and hence needs probing (unless of course
we expect people to want to put a modern Xen [shim] on top of a
pre-3.<whatever> Xen; and even then you could probe the
underlying hypercall once at boot time).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls
  2020-03-02 10:43       ` Jan Beulich
@ 2020-03-02 11:48         ` Roger Pau Monné
  0 siblings, 0 replies; 33+ messages in thread
From: Roger Pau Monné @ 2020-03-02 11:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Paul Durrant, Wei Liu, Andrew Cooper

On Mon, Mar 02, 2020 at 11:43:14AM +0100, Jan Beulich wrote:
> On 28.02.2020 17:52, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 05:29:32PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> >>> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> >>> @@ -199,7 +199,7 @@ static void __init e820_fixup(struct e820map *e820)
> >>>          panic("Unable to reserve Hyper-V hypercall range\n");
> >>>  }
> >>>  
> >>> -static const struct hypervisor_ops ops = {
> >>> +static const struct hypervisor_ops __initdata ops = {
> >>
> >> This needs to be __initconstrel in order to avoid triggering
> >> (possibly only in the future) section mismatch warnings with
> >> at least some gcc versions. With this and the other instance
> >> adjusted
> > 
> > I can do that when posting a new version, unless you want to pick this
> > earlier and adjust on commit.
> 
> Is this to mean that this 2nd to last patch in the series is
> fully independent of the earlier five (also contextually)?

Right, patches 5 to 7 should be completely independent, as they only
modify code related to Xen running as a guest.

> Then of course I'd be fine to make the adjustments and commit.
> Please confirm if so.

Yes please, do the adjustments on commit if you don't mind.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-03-02 10:31       ` Jan Beulich
@ 2020-03-02 16:50         ` Roger Pau Monné
  2020-03-03  8:50           ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Roger Pau Monné @ 2020-03-02 16:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On Mon, Mar 02, 2020 at 11:31:23AM +0100, Jan Beulich wrote:
> On 28.02.2020 17:50, Roger Pau Monné wrote:
> > On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote:
> >> On 19.02.2020 18:43, Roger Pau Monne wrote:
> >>> This was previously unconditionally done in each pre_flush call, but
> >>> that's not required: HVM guests not using shadow don't require linear
> >>> TLB flushes as Xen doesn't modify the guest page tables in that case
> >>> (ie: when using HAP).
> >>
> >> This explains the correctness in one direction. What about the
> >> removal of this from the switch_cr3_cr4() path?
> > 
> > AFAICT that's never used by shadow code to modify cr3 or cr4, and
> > hence doesn't require a guest linear TLB flush.
> 
> XSA-294 tells me to be very conservative here. It is not necessarily
> the direct use by shadow code that matters; toggle_guest_*() isn't
> used directly by it, either.

toggle_guest_{mode/pt} seems to be exclusively used by PV guests. I'm
fine with adding extra flushes to be on the safe side, but those
functions are never used against a HVM guest AFAICT. The only reason
to flush a HVM guest 'internal' TLB is when using shadow, and in that
case the shadow code must already take care of issuing such flushes.

> >> I admit the names of the involved functions
> >> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
> >> misleading, as they don't actually do any flushing, they merely
> >> arrange for what is in the TLB to no longer be able to be used,
> >> so giving this a suitable name is "historically" complicated.
> >> What if we did away with the hvm_flush_guest_tlbs() wrapper,
> >> naming the constant here then after hvm_asid_flush_core(), e.g.
> >> FLUSH_ASID_CORE?
> > 
> > I'm not opposed to renaming. The comment before the definition was
> > also meant to clarify it's usage, and hence the explicit mention of
> > ASID/VPID.
> 
> ... there's also one more argument for renaming: The present
> name doesn't convey at all that this operation is HVM-only
> (i.e. PV guests wouldn't have their TLBs [as far as one can
> call them "their"] flushed).

Do you think FLUSH_ASID_CORE is clear enough, or would you prefer
FLUSH_HVM_ASID_CORE?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag
  2020-03-02 16:50         ` Roger Pau Monné
@ 2020-03-03  8:50           ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2020-03-03  8:50 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: George Dunlap, xen-devel, Tim Deegan, Wei Liu, Andrew Cooper

On 02.03.2020 17:50, Roger Pau Monné wrote:
> On Mon, Mar 02, 2020 at 11:31:23AM +0100, Jan Beulich wrote:
>> On 28.02.2020 17:50, Roger Pau Monné wrote:
>>> On Fri, Feb 28, 2020 at 05:14:05PM +0100, Jan Beulich wrote:
>>>> On 19.02.2020 18:43, Roger Pau Monne wrote:
>>>>> This was previously unconditionally done in each pre_flush call, but
>>>>> that's not required: HVM guests not using shadow don't require linear
>>>>> TLB flushes as Xen doesn't modify the guest page tables in that case
>>>>> (ie: when using HAP).
>>>>
>>>> This explains the correctness in one direction. What about the
>>>> removal of this from the switch_cr3_cr4() path?
>>>
>>> AFAICT that's never used by shadow code to modify cr3 or cr4, and
>>> hence doesn't require a guest linear TLB flush.
>>
>> XSA-294 tells me to be very conservative here. It is not necessarily
>> the direct use by shadow code that matters; toggle_guest_*() isn't
>> used directly by it, either.
> 
> toggle_guest_{mode/pt} seems to be exclusively used by PV guests. I'm
> fine with adding extra flushes to be on the safe side, but those
> functions are never used against a HVM guest AFAICT. The only reason
> to flush a HVM guest 'internal' TLB is when using shadow, and in that
> case the shadow code must already take care of issuing such flushes.

What I'm asking for primarily is to extend the description. If it
is clear enough, it ought to also be clear enough that no flushes
need inserting anywhere.

>>>> I admit the names of the involved functions
>>>> (hvm_flush_guest_tlbs(), hvm_asid_flush_core()) are somewhat
>>>> misleading, as they don't actually do any flushing, they merely
>>>> arrange for what is in the TLB to no longer be able to be used,
>>>> so giving this a suitable name is "historically" complicated.
>>>> What if we did away with the hvm_flush_guest_tlbs() wrapper,
>>>> naming the constant here then after hvm_asid_flush_core(), e.g.
>>>> FLUSH_ASID_CORE?
>>>
>>> I'm not opposed to renaming. The comment before the definition was
>>> also meant to clarify it's usage, and hence the explicit mention of
>>> ASID/VPID.
>>
>> ... there's also one more argument for renaming: The present
>> name doesn't convey at all that this operation is HVM-only
>> (i.e. PV guests wouldn't have their TLBs [as far as one can
>> call them "their"] flushed).
> 
> Do you think FLUSH_ASID_CORE is clear enough, or would you prefer
> FLUSH_HVM_ASID_CORE?

While in principle in our code base ASID implies HVM, perhaps the
latter would still be even slightly better.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-03-03  8:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19 17:43 [Xen-devel] [PATCH v5 0/7] x86: improve assisted tlb flush and use it in guest mode Roger Pau Monne
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 1/7] x86/hvm: allow ASID flush when v != current Roger Pau Monne
2020-02-28 13:29   ` Jan Beulich
2020-02-28 15:27     ` Roger Pau Monné
2020-02-28 15:47       ` Jan Beulich
2020-02-28 16:20         ` Roger Pau Monné
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 2/7] x86/paging: add TLB flush hooks Roger Pau Monne
2020-02-28 14:50   ` Jan Beulich
2020-02-28 16:19     ` Roger Pau Monné
2020-02-28 16:40       ` Jan Beulich
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 3/7] x86/hap: improve hypervisor assisted guest TLB flush Roger Pau Monne
2020-02-28 13:58   ` Jan Beulich
2020-02-28 16:31     ` Roger Pau Monné
2020-02-28 16:44       ` Jan Beulich
2020-02-28 16:57         ` Roger Pau Monné
2020-02-28 17:02           ` Jan Beulich
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 4/7] x86/tlb: introduce a flush guests TLB flag Roger Pau Monne
2020-02-28 16:14   ` Jan Beulich
2020-02-28 16:50     ` Roger Pau Monné
2020-03-02 10:31       ` Jan Beulich
2020-03-02 16:50         ` Roger Pau Monné
2020-03-03  8:50           ` Jan Beulich
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 5/7] x86/tlb: allow disabling the TLB clock Roger Pau Monne
2020-02-28 16:25   ` Jan Beulich
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 6/7] xen/guest: prepare hypervisor ops to use alternative calls Roger Pau Monne
2020-02-28 16:29   ` Jan Beulich
2020-02-28 16:52     ` Roger Pau Monné
2020-03-02 10:43       ` Jan Beulich
2020-03-02 11:48         ` Roger Pau Monné
2020-02-19 17:43 ` [Xen-devel] [PATCH v5 7/7] x86/tlb: use Xen L0 assisted TLB flush when available Roger Pau Monne
2020-02-28 17:00   ` Jan Beulich
2020-02-28 17:23     ` Roger Pau Monné
2020-03-02 10:52       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).